Fixing N+1 query problem in Django
N+1 is a very common data-access pattern problem where a query is executed for every result of a previous query. If the definition looks cryptic, let's understand this problem with an example. We have created Country, Musician and Album tables with 20 rows in Album, 10 rows in Musician, and 3 rows in the Country table respectively.
from django.db import models
class Country(models.Model):
name = models.CharField(max_length=50)
class Musician(models.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
instrument = models.CharField(max_length=100)
country = models.ForeignKey(Country, on_delete=models.CASCADE)
class Album(models.Model):
artist = models.ForeignKey(Musician, on_delete=models.CASCADE)
name = models.CharField(max_length=100)
release_date = models.DateField()
num_stars = models.IntegerField()In [1]: Album.objects.count()
Out[1]: 20
In [2]: Musician.objects.count()
Out[2]: 10
In [3]: Country.objects.count()
Out[3]: 3We will be using a decorator to print all the stats for the query.
from django.db import connection
from django.db import reset_queries
def database_debug(func):
def inner_func(*args, **kwargs):
reset_queries()
results = func()
query_info = connection.queries
print('function_name: {}'.format(func.__name__))
print('query_count: {}'.format(len(query_info)))
queries = ['{}\n'.format(query['sql']) for query in query_info]
print('queries: \n{}'.format(''.join(queries)))
return results
return inner_func
@database_debug
def check_query_performance():
albums_qs = Album.objects.all()
for album in albums_qs:
return (album.name, album.artist.first_name)
Any guesses how many database queries will be fired for the above method? Well, it is 21, 1 to get all the albums and 20(N) queries more to get the first_name of the artist. This is therefore a 1+n query or popularly known as n+1 query problem. Let's check the debug data once to understand what are these 21 queries.
function_name: check_query_performance
query_count: 21
queries:
SELECT `common_album`.`id`, `common_album`.`artist_id`, `common_album`.`name`, `common_album`.`release_date`, `common_album`.`num_stars` FROM `common_album`
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` = 5 LIMIT 21
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` = 6 LIMIT 21
.
.
.
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` = 2 LIMIT 21
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` = 8 LIMIT 21How to fix n+1 query problem?
Now that we have seen and understood the n+1 query problem, let's now fix the issue. Obviously, the fix would be to get the rows in 1 go or get in batches. Lets check the available method which we can use to avoid n+1 query problem.
Method I: With select_related
select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query.
@database_debug
def check_query_performance_with_select_related():
albums_qs = Album.objects.select_related('artist').all()
for album in albums_qs:
return (album.name, album.artist.first_name)
check_query_performance_with_select_related()function_name: check_query_performance
query_count: 1
queries:
SELECT `common_album`.`id`, `common_album`.`artist_id`, `common_album`.`name`, `common_album`.`release_date`, `common_album`.`num_stars`, `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_album` INNER JOIN `common_musician` ON (`common_album`.`artist_id` = `common_musician`.`id`)Method II: With prefetch_related
The same query again and this time let's use prefetch_related.
@database_debug
def check_query_performance_with_prefetch_related():
albums_qs = Album.objects.prefetch_related('artist').all()
for album in albums_qs:
return (album.name, album.artist.first_name)
check_query_performance_with_prefetch_related()function_name: check_query_performance_with_prefetch_related
query_count: 2
queries:
SELECT `common_album`.`id`, `common_album`.`artist_id`, `common_album`.`name`, `common_album`.`release_date`, `common_album`.`num_stars` FROM `common_album`
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
Let's see, what's happening here, the query count is now 2, 1st query is to get all the albums and 2nd query is to get all the album's artist first_name. In prefetch_realted join happens in python, which is happening here, all distinct musician's data are collected and then called at once. Since prefetch_related does a separate lookup for each relationship it allows to prefetch many-to-many and many-to-one objects, which cannot be done using select_related
Conclusion:
- If your querying something, in a loop, this becomes a case of n+1 query problem. The solution is either to have joins at database level or get all related fields using
inquery. select_relateduses native database joins, whereas in the case ofprefetch_relatedadditional queries are fired later and can be therefore said that joining happens in python rather than the database.- Since
select_relateduses database joins Django allows this only in case ofOneToOneFieldorForeignKeys, but, this restriction is not present in the case ofprefetch_related. Therefore, all selected_related can be replaced with prefetch_related, however, another way round is not possible.



