Fixing N+1 query problem in Django

N+1 is a very common data-access pattern problem where a query is executed for every result of a previous query. If the definition looks cryptic, let's understand this problem with an example. We have created Country, Musician and Album tables with 20 rows in Album, 10 rows in Musician, and 3 rows in the Country table respectively.

from django.db import models


class Country(models.Model):
    name = models.CharField(max_length=50)

class Musician(models.Model):
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)
    instrument = models.CharField(max_length=100)
    country = models.ForeignKey(Country, on_delete=models.CASCADE)

class Album(models.Model):
    artist = models.ForeignKey(Musician, on_delete=models.CASCADE)
    name = models.CharField(max_length=100)
    release_date = models.DateField()
    num_stars = models.IntegerField()
In [1]: Album.objects.count()
Out[1]: 20

In [2]: Musician.objects.count()
Out[2]: 10

In [3]: Country.objects.count()
Out[3]: 3

We will be using a decorator to print all the stats for the query.

from django.db import connection
from django.db import reset_queries


def database_debug(func):
    def inner_func(*args, **kwargs):
        reset_queries()
        results = func()
        query_info = connection.queries
        print('function_name: {}'.format(func.__name__))
        print('query_count: {}'.format(len(query_info)))
        queries = ['{}\n'.format(query['sql']) for query in query_info]
        print('queries: \n{}'.format(''.join(queries)))
        return results
    return inner_func
@database_debug
def check_query_performance():
    albums_qs = Album.objects.all()
    for album in albums_qs:
        return (album.name, album.artist.first_name)

Any guesses how many database queries will be fired for the above method? Well, it is 21, 1 to get all the albums and 20(N) queries more to get the first_name of the artist. This is therefore a 1+n query or popularly known as n+1 query problem. Let's check the debug data once to understand what are these 21 queries.

function_name: check_query_performance
query_count: 21
queries: 
SELECT `common_album`.`id`, `common_album`.`artist_id`, `common_album`.`name`, `common_album`.`release_date`, `common_album`.`num_stars` FROM `common_album`
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` = 5 LIMIT 21
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` = 6 LIMIT 21
.
.
.
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` = 2 LIMIT 21
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` = 8 LIMIT 21

How to fix n+1 query problem?

Now that we have seen and understood the n+1 query problem, let's now fix the issue. Obviously, the fix would be to get the rows in 1 go or get in batches. Lets check the available method which we can use to avoid n+1 query problem.

Method I: With select_related

select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query.

@database_debug
def check_query_performance_with_select_related():
    albums_qs = Album.objects.select_related('artist').all()
    for album in albums_qs:
        return (album.name, album.artist.first_name)


check_query_performance_with_select_related()
function_name: check_query_performance
query_count: 1
queries: 
SELECT `common_album`.`id`, `common_album`.`artist_id`, `common_album`.`name`, `common_album`.`release_date`, `common_album`.`num_stars`, `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_album` INNER JOIN `common_musician` ON (`common_album`.`artist_id` = `common_musician`.`id`)

Method II: With prefetch_related

The same query again and this time let's use prefetch_related.

@database_debug
def check_query_performance_with_prefetch_related():
    albums_qs = Album.objects.prefetch_related('artist').all()
    for album in albums_qs:
        return (album.name, album.artist.first_name)


check_query_performance_with_prefetch_related()
function_name: check_query_performance_with_prefetch_related
query_count: 2
queries: 
SELECT `common_album`.`id`, `common_album`.`artist_id`, `common_album`.`name`, `common_album`.`release_date`, `common_album`.`num_stars` FROM `common_album`
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)

Let's see, what's happening here, the query count is now 2, 1st query is to get all the albums and 2nd query is to get all the album's artist first_name. In prefetch_realted join happens in python, which is happening here, all distinct musician's data are collected and then called at once. Since prefetch_related does a separate lookup for each relationship it allows to prefetch many-to-many and many-to-one objects, which cannot be done using select_related

Conclusion:

  • If your querying something, in a loop, this becomes a case of n+1 query problem. The solution is either to have joins at database level or get all related fields using in query.
  • select_related uses native database joins, whereas in the case of prefetch_related additional queries are fired later and can be therefore said that joining happens in python rather than the database.
  • Since select_related uses database joins Django allows this only in case of OneToOneField or ForeignKeys , but, this restriction is not present in the case of prefetch_related. Therefore, all selected_related can be replaced with prefetch_related, however, another way round is not possible.
Select Related Vs Prefetch Related In Django
Django’s select_related and prefetch_related are ways to optimize DB queries when accessing related items. We will understand their differences in this blog.
I don’t understand the GraphQL N+1 problem
So just yesterday i started learning graphql its really interesting, and quite easy to learn and understand actually. i started reading some articles and i found the N+1 problem. i found this example