[PATCH 1/4] REST: massively improve the patch counting query under filters

Daniel Axtens dja at axtens.net
Wed Mar 18 00:59:13 AEDT 2020

The DRF web view counts the patches as part of pagination.

The query it uses is a disaster zone:

SELECT COUNT(*) FROM (SELECT DISTINCT `patchwork_submission`.`id` AS Col1, `patchwork_submission`.`msgid` AS Col2,
`patchwork_submission`.`date` AS Col3, `patchwork_submission`.`submitter_id` AS Col4, `patchwork_submission`.`project_id` AS Col5,
`patchwork_submission`.`name` AS Col6, `patchwork_patch`.`submission_ptr_id` AS Col7, `patchwork_patch`.`commit_ref` AS Col8,
`patchwork_patch`.`pull_url` AS Col9, `patchwork_patch`.`delegate_id` AS Col10, `patchwork_patch`.`state_id` AS Col11,
`patchwork_patch`.`archived` AS Col12, `patchwork_patch`.`hash` AS Col13, `patchwork_patch`.`patch_project_id` AS Col14,
`patchwork_patch`.`series_id` AS Col15, `patchwork_patch`.`number` AS Col16, `patchwork_patch`.`related_id` AS Col17 FROM
`patchwork_patch` INNER JOIN `patchwork_submission` ON (`patchwork_patch`.`submission_ptr_id` = `patchwork_submission`.`id`) WHERE
`patchwork_submission`.`project_id` = 1) subquery

This is because django-filters adds a DISTINCT qualifier on a
ModelMultiChoiceFilter by default. I guess it makes sense and they do a
decent job of justifying it, but it causes the count to be made with
this awful subquery. (The justification is that they don't know if you're
filtering on a to-many relationship, in which case there could be
duplicate values that need to be removed.)

While fixing that, we can also tell the filter to filter on patch_project
rather than submission's project, which allows us in some cases to avoid
the join entirely.

The resultant SQL is beautiful when filtering by project only:

SELECT COUNT(*) AS `__count`
  FROM `patchwork_patch`
 WHERE `patchwork_patch`.`patch_project_id` = 1

On my test setup (2x canonical kernel mailing list in the db, warm cache,
my laptop) this query goes from >1s to ~10ms, a ~100x improvement.

If we filter by project and date the query is still nice, but still also
very slow:

SELECT COUNT(*) AS `__count`
  FROM `patchwork_patch`
 INNER JOIN `patchwork_submission`
    ON (`patchwork_patch`.`submission_ptr_id` = `patchwork_submission`.`id`)
 WHERE (`patchwork_patch`.`patch_project_id` = 1 AND `patchwork_submission`.`date` >= '2010-11-01 00:00:00')

This us from ~1.3s to a bit under 400ms - still not ideal, but I'll take
the 3x improvement!

Reported-by: Konstantin Ryabitsev <konstantin at linuxfoundation.org>
Signed-off-by: Daniel Axtens <dja at axtens.net>
 patchwork/api/filters.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/patchwork/api/filters.py b/patchwork/api/filters.py
index 60f1a3634fcf..a3b6f03c6ccd 100644
--- a/patchwork/api/filters.py
+++ b/patchwork/api/filters.py
@@ -188,7 +188,8 @@ class CoverLetterFilterSet(TimestampMixin, BaseFilterSet):
 class PatchFilterSet(TimestampMixin, BaseFilterSet):
-    project = ProjectFilter(queryset=Project.objects.all())
+    project = ProjectFilter(queryset=Project.objects.all(), distinct=False,
+                            field_name='patch_project')
     # NOTE(stephenfin): We disable the select-based HTML widgets for these
     # filters as the resulting query is _huge_
     series = BaseFilter(queryset=Series.objects.all(),

More information about the Patchwork mailing list