Fixing N+1 query problem in Django
N+1 is a very common data-access pattern problem where a query is executed for every result of a previous query. If the definition looks cryptic, let's understand this problem with an example. We have created Country
, Musician
and Album
tables with 20 rows in Album
, 10 rows in Musician
, and 3 rows in the Country
table respectively.
from django.db import models
class Country(models.Model):
name = models.CharField(max_length=50)
class Musician(models.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
instrument = models.CharField(max_length=100)
country = models.ForeignKey(Country, on_delete=models.CASCADE)
class Album(models.Model):
artist = models.ForeignKey(Musician, on_delete=models.CASCADE)
name = models.CharField(max_length=100)
release_date = models.DateField()
num_stars = models.IntegerField()
In [1]: Album.objects.count()
Out[1]: 20
In [2]: Musician.objects.count()
Out[2]: 10
In [3]: Country.objects.count()
Out[3]: 3
We will be using a decorator to print all the stats for the query.
from django.db import connection
from django.db import reset_queries
def database_debug(func):
def inner_func(*args, **kwargs):
reset_queries()
results = func()
query_info = connection.queries
print('function_name: {}'.format(func.__name__))
print('query_count: {}'.format(len(query_info)))
queries = ['{}\n'.format(query['sql']) for query in query_info]
print('queries: \n{}'.format(''.join(queries)))
return results
return inner_func
@database_debug
def check_query_performance():
albums_qs = Album.objects.all()
for album in albums_qs:
return (album.name, album.artist.first_name)
Any guesses how many database queries will be fired for the above method? Well, it is 21, 1 to get all the albums and 20(N) queries more to get the first_name
of the artist. This is therefore a 1+n
query or popularly known as n+1
query problem. Let's check the debug data once to understand what are these 21 queries.
function_name: check_query_performance
query_count: 21
queries:
SELECT `common_album`.`id`, `common_album`.`artist_id`, `common_album`.`name`, `common_album`.`release_date`, `common_album`.`num_stars` FROM `common_album`
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` = 5 LIMIT 21
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` = 6 LIMIT 21
.
.
.
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` = 2 LIMIT 21
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` = 8 LIMIT 21
How to fix n+1 query problem?
Now that we have seen and understood the n+1
query problem, let's now fix the issue. Obviously, the fix would be to get the rows in 1 go or get in batches. Lets check the available method which we can use to avoid n+1 query problem.
Method I: With select_related
select_related
works by creating an SQL join and including the fields of the related object in the SELECT
statement. For this reason, select_related
gets the related objects in the same database query.
@database_debug
def check_query_performance_with_select_related():
albums_qs = Album.objects.select_related('artist').all()
for album in albums_qs:
return (album.name, album.artist.first_name)
check_query_performance_with_select_related()
function_name: check_query_performance
query_count: 1
queries:
SELECT `common_album`.`id`, `common_album`.`artist_id`, `common_album`.`name`, `common_album`.`release_date`, `common_album`.`num_stars`, `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`instrument`, `common_musician`.`country_id` FROM `common_album` INNER JOIN `common_musician` ON (`common_album`.`artist_id` = `common_musician`.`id`)
Method II: With prefetch_related
The same query again and this time let's use prefetch_related
.
@database_debug
def check_query_performance_with_prefetch_related():
albums_qs = Album.objects.prefetch_related('artist').all()
for album in albums_qs:
return (album.name, album.artist.first_name)
check_query_performance_with_prefetch_related()
function_name: check_query_performance_with_prefetch_related
query_count: 2
queries:
SELECT `common_album`.`id`, `common_album`.`artist_id`, `common_album`.`name`, `common_album`.`release_date`, `common_album`.`num_stars` FROM `common_album`
SELECT `common_musician`.`id`, `common_musician`.`first_name`, `common_musician`.`last_name`, `common_musician`.`country_id` FROM `common_musician` WHERE `common_musician`.`id` IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
Let's see, what's happening here, the query count is now 2, 1st query is to get all the albums
and 2nd query is to get all the album's artist first_name. In prefetch_realted
join happens in python, which is happening here, all distinct musician's data are collected and then called at once. Since prefetch_related
does a separate lookup for each relationship it allows to prefetch many-to-many and many-to-one objects, which cannot be done using select_related
Conclusion:
- If your querying something, in a loop, this becomes a case of n+1 query problem. The solution is either to have joins at database level or get all related fields using
in
query. select_related
uses native database joins, whereas in the case ofprefetch_related
additional queries are fired later and can be therefore said that joining happens in python rather than the database.- Since
select_related
uses database joins Django allows this only in case ofOneToOneField
orForeignKeys
, but, this restriction is not present in the case ofprefetch_related
. Therefore, all selected_related can be replaced with prefetch_related, however, another way round is not possible.