[PATCH] Optimise querying of checks in patch list view
dja at axtens.net
Thu Sep 15 18:10:29 AEST 2016
tl;dr: with about 300 mails from the patchwork list, according to
django-debug-toolbar, to render
Without this patch:
- ~1.35 seconds of CPU time
- 110 SQL queries, taking ~70ms
With this patch:
- < 0.3 seconds of CPU time
- 10 SQL queries, taking <20ms
How? Replace an .exclude() on a QuerySet with a list comprehension.
Yes, that's normally a pessimisation.
Surprisingly, it's an optimisation here.
Why? Where we're looking at patches in anything that uses a generic_list()
in the view, we do a prefetch_related('check_set'). But, if we then do a
.filter or a .exclude, that throws out the existing, cached information,
and does another query. (See the Django docs on prefetch_related.)
So, do it 'by hand' in Python instead.
Signed-off-by: Daniel Axtens <dja at axtens.net>
patchwork/models.py | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/patchwork/models.py b/patchwork/models.py
index 5043c9a031a8..e7538703ff86 100644
@@ -502,7 +502,17 @@ class Patch(Submission):
unique[user][ctx] = check
# filter out the "duplicates" or older, now-invalid results
- return self.check_set.all().exclude(id__in=duplicates)
+ # Why don't we use filter or exclude here? Surprisingly, it's
+ # an optimisation in the common case. Where we're looking at
+ # checks in anything that uses a generic_list() in the view,
+ # we do a prefetch_related('check_set'). But, if we then do a
+ # .filter or a .exclude, that throws out the existing, cached
+ # information, and does another query. (See the Django docs on
+ # prefetch_related.) So, do it 'by hand' in Python. We can
+ # also be confident that this won't be worse, seeing as we've
+ # just iterated over self.check_set.all() *anyway*.
+ return [c for c in self.check_set.all() if c.id not in duplicates]
More information about the Patchwork