Utilising caching in your applications
While django-cms takes care of caching pages, placeholders and plugins, sometimes you need to integrate your own applications. So you might need to consider how efficient your queries are. When writing a django application, minimising the number of database queries you make is a significant step towards an efficient system. A fast & efficient system will provide a better experience to users.
To minimise the number of database queries you make, there are a number of things you can do. The easy ones are making use of select_related
and prefetch_related
to allow you to select related objects from the database while you retrieve the data that you’re interested in. This is something you’ll need to consider for any applications you hook into django-cms.
select_related
allows you to retrieve objects which have a ForeignKey
or OneToOne
relationship to the data that you’re retrieving from the database.
prefetch_related
allows you to retrieve a set of objects, so a ManyToMany
relationship or a reverse ForeignKey
To illustrate how this is used, consider the following models:
from django.db import models class Author(models.Model): class Meta: verbose_name = 'Author' verbose_name_plural = 'Authors' name = models.CharField(max_length=255) class Book(models.Model): class Meta: verbose_name = 'Book' verbose_name_plural = 'Books' name = models.CharField(max_length=255) author = models.ForeignKey(Author, on_delete=models.PROTECT) slug = models.SlugField( unique=True, db_index=True, max_length=255, ) class Shop(models.Model): class Meta: verbose_name = 'Shop' verbose_name_plural = 'Shops' name = models.CharField(max_length=255) books = models.ManyToManyField(Book)
A view to list all the books might then look like this:
from django.views.generic import ListView from .models import Book class BookListView(ListView): model = Book queryset = Book.objects.all()
Which would iterate over each book in the queryset
in the template to show the details for the book:
{% extends "base.html" %} {% block content %} <h1>Books</h1> {% if object_list %} <ul> {% for book in object_list %} <li> <a href="{{ book.get_absolute_url }}">{{ book.title }}</a> ({{ book.author.name }}) </li> {% endfor %} </ul> {% else %} <p>There are no books.</p> {% endif %} {% endblock %}
This might look ok, but book.author
creates a join to another table for every book in the queryset
. So each row in the template performs a database query to retrieve the author object. You really don’t want to do that.
The solution to this is to modify the queryset
in the view to utilise select_related
.
class BookListView(generic.ListView): model = Book queryset = Book.objects.select_related('author').all()
This ensures that the related author objects are collected in the same query that collects the books. Much more efficient.
Another view might list all of the shops in the database and this is where the use of prefetch_related would come in because there are many related objects. Django does this using python rather than it all being in SQL.
class ShopListView(generic.ListView): model = Shop queryset = Shop.objects.all()
Which would iterate over each book in the queryset in the template to show the details for the book:
{% extends "base.html" %} {% block content %} <h1>Shops</h1> {% if object_list %} <ul> {% for shop in object_list %} <li> <a href="{{ shop.get_absolute_url }}">{{ shop.name }}</a> <ul> {% for book in shop.books.all %} <li>{{ book.name }}</li> {% endfor %} </ul> </li> {% endfor %} </ul> {% else %} <p>There are no shops.</p> {% endif %} {% endblock %}
This template would, like the books before, cause database queries to get out of control. But again, the solution is really simple:
class ShopListView(generic.ListView): model = Shop queryset = Shop.objects.prefetch_related('books')
A really vital tool when assessing the database efficiency of a view is the SQL panel of django-debug-toolbar. It will give you a count of the queries made to render the page and you can inspect these to find out where the queries come from, how long they take to execute and more. This is where it becomes really easy to see where you may have missed the use of the above functionality because you see repeated queries listed.
Caching
Once your application is making efficient queries, minimising the impact on the database, you can then start to consider where you might want to cache that data retrieved from the database.
If you don’t cache the data, every request to a view which requires data from the database, will have to query for that data. So if you have a website which gets a lot of traffic, you’ll soon find the limits of your database server.
If you’re unfamiliar, a cache is usually a system that allows us to store data in memory, because retrieving data from memory is much faster than getting it from a database. The most common caches will be local memory on the web server, known as "locmem" to django, memcached or redis. Django’s caching docs can be found here. It’s worth noting that it is possible to also use a table in the database for caching and also the file system.
Local memory caching might be a good place to start if you’re new to caching because the others require services to be running, typically on separate servers to get the best out of them. But local memory caching has its drawbacks, because it’s using the memory of the web server & that takes away from the memory available to the application. Therefore you should avoid doing this in production. It also means that if you’re running multiple servers they can’t share a cache, so each machine can only access what it’s cached to its own memory. By operating a memcached or redis cache you can have dedicated hardware for your cache which allows your application to be scalable.
Cache settings in django-cms
The following settings are enabled by default and the respective parts of the cms are cached based on each setting;
CMS_CACHE_DURATIONS (docs)
A dictionary defining various cache durations by the following keys:
CMS_CACHE_PREFIX (docs)
This is a setting you should customise if you share a cache between different django-cms applications
Further cache settings can be found in the documentation.
Utilising a cache in your application
Once you have configured a caching service you need to fill it with data. That way, when a request comes into a view which performs a database query, it might be able to find that data in the cache and not have to go to the database for it.
The Book
model has a slug
field which is used in the URL patterns and this is a good attribute to cache data on because it’s unique. Considering what data is unique like that is going to be important with a cache implementation because once you store something you need to be able to create the key again in order to get that data back out from the cache.
In the application there may be a detail view for a specific book that uses a slug in the path and this can be used to get the book from cache:
# urls.py path( '<slug:slug>/', view=BookDetailView.as_view(), name='detail', ), </slug:slug>
# views.py from django.views.generic import View from django.http import Http404 from .models import Book class BookDetailView(View): """ Book detail view """ def __init__(self, **kwargs): super().__init__(**kwargs) # these are initialised in dispatch() self.book_slug = None self.book = None def dispatch(self, request, *args, **kwargs): self.book_slug = kwargs.get(slug) if not self.book_slug: # something badly wrong with URL routing... raise Http404 self.book = Book.cached_by_slug(self.book_slug) if not self.book: # book not found raise Http404 return super().dispatch(request, *args, **kwargs) def get_context_data(self, **kwargs): """ Add the book to the context """ context = super().get_context_data(**kwargs) context['book'] = self.book return context
The view above uses a class method on the Book
model called cached_by_slug
which you can see below.This creates a key using the provided slug from the URL which is used to query the cache. If the key is in the cache, the result is returned. If the key is not in the cache, the database is queried and the result is then set to cache and the result returned.
from django.db import models CACHED_BOOK_BY_SLUG_KEY = 'book__by_slug__{}' CACHED_BOOK_LENGTH = 24 * 3600 # 24hrs class NotFound: """ Used by the caching """ class Book(models.Model): class Meta: verbose_name = 'Book' verbose_name_plural = 'Books' name = models.CharField(max_length=255) author = models.ForeignKey(Author, on_delete=models.PROTECT) slug = models.SlugField( unique=True, db_index=True, max_length=255, ) @staticmethod def cached_by_slug(slug): """ Return the book (from the cache if possible) via the specified slug :param slug: the book slug of the book to retrieve :type slug: str :return: the Book with the specified slug, from the cache if possible, else from the database. Will return None if not found :rtype: Book or NoneType """ key = CACHED_BOOK_BY_SLUG_KEY.format(slug) book = cache.get(key) if book: if isinstance(book, NotFound): return None return book book = Book.objects.filter(slug=slug).first() if not book: cache.set(key, NotFound(), CACHED_BOOK_LENGTH) return None cache.set(key, book, CACHED_BOOK_LENGTH) return book
Cache invalidation
Arguably the most important thing about caching is being able to invalidate your cached data. When the data in the database is modified, you need to be able to tell your application that related data in the cache is no longer valid and that the next time cached data is requested, it needs a fresh copy from the database first.
This can be done via django signals, and the best place to include these is often below your models, where signals specific to the model(s) can then be received. A signal receiver is just a function, so you can use these to clear the cache, or even run other functions like creating related objects or sending emails.
A simple example of cache invalidation for the Book
model would include signals from saving or deleting a book instance:
from django.core.cache import cache from django.db import models from django.db.models.signals import post_delete, post_save from django.dispatch import receiver CACHED_BOOK_BY_SLUG_KEY = 'book__by_slug__{}' CACHED_BOOK_LENGTH = 24 * 3600 # 24hrs class Book(models.Model): class Meta: verbose_name = 'Book' verbose_name_plural = 'Books' name = models.CharField(max_length=255) author = models.ForeignKey(Author, on_delete=models.PROTECT) slug = models.SlugField( unique=True, db_index=True, max_length=255, ) @staticmethod def cached_by_slug(slug): key = CACHED_BOOK_BY_SLUG_KEY.format(slug) book = cache.get(key) if book: if isinstance(book, NotFound): return None return book book = Book.objects.filter(slug=slug).first() if not book: cache.set(key, NotFound(), CACHED_BOOK_LENGTH) return None cache.set(key, book, CACHED_BOOK_LENGTH) return book @receiver((post_delete, post_save), sender=Book) def invalidate_book_cache(sender, instance, **kwargs): """ Invalidate the book cached data when a book is changed or deleted """ cache.delete(CACHED_BOOK_BY_SLUG_KEY.format(instance.slug)
Hopefully I've covered the core concepts here, but it's a complex topic that you could write so much on. Please get in touch if you've any feedback. You can reach me on twitter or our Slack.
blog comments powered by Disqus