[PATCH] Fix slow Patch counting query

Daniel Axtens dja at axtens.net
Wed Mar 28 09:42:33 AEDT 2018


Stephen Finucane <stephen at that.guru> writes:

> On Fri, 2018-03-09 at 02:54 +1100, Daniel Axtens wrote:
>> Stephen Rothwell noticed (way back in September - sorry Stephen!) that
>> the following query is really slow on OzLabs:
>> 
>> SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
>>     INNER JOIN "patchwork_submission" ON
>>         ("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id")
>>     WHERE ("patchwork_submission"."project_id" = 14 AND
>>            "patchwork_patch"."state_id" IN
>> 	       (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
>>                 WHERE U0."action_required" = true
>> 		ORDER BY U0."ordering" ASC));
>> 
>> I think this is really slow because we have to join the patch and
>> submission table to get the project id, which we need to filter the
>> patches.
>> 
>> Duplicate the project id in the patch table itself, which allows us to
>> avoid the JOIN.
>> 
>> The new query reads as:
>> SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
>>     WHERE ("patchwork_patch"."patch_project_id" = 1 AND
>>            "patchwork_patch"."state_id" IN
>> 	       (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
>> 	        WHERE U0."action_required" = true
>> 		ORDER BY U0."ordering" ASC));
>> 
>> Very simple testing on a small, artifical Postgres instance (3
>> projects, 102711 patches), shows speed gains of ~1.5-5x for this
>> query. Looking at Postgres' cost estimates (EXPLAIN) of the first
>> query vs the second query, we see a ~1.75x improvement there too.
>> 
>> I suspect the gains will be bigger on OzLabs.
>> 
>> (It turns out all of this is all for the "| NN patches" counter we
>> added to the filter bar!!)
>> 
>> Reported-by: Stephen Rothwell <sfr at canb.auug.org.au>
>> Signed-off-by: Daniel Axtens <dja at axtens.net>
>
> It's unfortunate that this has already merged. While it works, it
> defeats the whole point of the multi-table inheritance introduced in
> commit 86172ccc1 (normalization). To be honest, given the performance
> impacts that particular change introduced (which we're only seeing at
> scale), I'd rather denormalize the whole thing and fold the 'Patch' and
> 'CoverLetter' models back into 'Submission' and just use a 'type' field
> (or similar) to control behavior. Is there any reason not to do this?

I agree that would be a better conceptual solution. The reason I didn't
do it is because I tried a couple of times and couldn't get it all to
work, and I haven't had the time to really sit down and re-engineer it.

If you can get it to work I am happy to revert this and apply that; the
changes to the database schema aren't at all difficult to undo so any
brave souls running master won't be completely hosed.

Regards,
Daniel

> If not, I'd like to wait on 2.1 until we've done both this and the
> event fixes.
>
> Stephen
>
>> ---
>> 
>> This requires a migration, so I don't think we can feasibly do it as a
>> stable update.
>> 
>> I think we drop the patch counter for stable and try to get this and
>> the event stuff merged to master promptly, and just tag 2.1. (To that
>> end, I will re-read and finish reviewing the event stuff soon.)
>> ---
>>  patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++
>>  patchwork/models.py                              |  4 +++
>>  patchwork/parser.py                              |  1 +
>>  patchwork/views/__init__.py                      |  2 +-
>>  4 files changed, 45 insertions(+), 1 deletion(-)
>>  create mode 100644 patchwork/migrations/0024_patch_patch_project.py
>> 
>> diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py
>> new file mode 100644
>> index 000000000000..76d8f144c9dd
>> --- /dev/null
>> +++ b/patchwork/migrations/0024_patch_patch_project.py
>> @@ -0,0 +1,39 @@
>> +# -*- coding: utf-8 -*-
>> +# Generated by Django 1.11.10 on 2018-03-08 01:51
>> +from __future__ import unicode_literals
>> +
>> +from django.db import migrations, models
>> +import django.db.models.deletion
>> +
>> +
>> +class Migration(migrations.Migration):
>> +    # per migration 16, but note this seems to be going away
>> +    # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113)
>> +    atomic = False
>> +
>> +    dependencies = [
>> +        ('patchwork', '0023_timezone_unify'),
>> +    ]
>> +
>> +    operations = [
>> +        migrations.AddField(
>> +            model_name='patch',
>> +            name='patch_project',
>> +            field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
>> +            preserve_default=False,
>> +        ),
>> +
>> +        # as with 10, this will break if you use non-default table names
>> +        migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id =
>> +                               (SELECT project_id FROM patchwork_submission
>> +                                WHERE patchwork_submission.id =
>> +                                        patchwork_patch.submission_ptr_id);'''
>> +        ),
>> +
>> +        migrations.AlterField(
>> +            model_name='patch',
>> +            name='patch_project',
>> +            field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
>> +        ),
>> +
>> +    ]
>> diff --git a/patchwork/models.py b/patchwork/models.py
>> index b2491752f04a..3b905c4cd75b 100644
>> --- a/patchwork/models.py
>> +++ b/patchwork/models.py
>> @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission):
>>      archived = models.BooleanField(default=False)
>>      hash = HashField(null=True, blank=True)
>>  
>> +    # duplicate project from submission in subclass so we can count the
>> +    # patches in a project without needing to do a JOIN.
>> +    patch_project = models.ForeignKey(Project, on_delete=models.CASCADE)
>> +
>>      objects = PatchManager()
>>  
>>      @staticmethod
>> diff --git a/patchwork/parser.py b/patchwork/parser.py
>> index 803e98592fa8..805037c72d73 100644
>> --- a/patchwork/parser.py
>> +++ b/patchwork/parser.py
>> @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None):
>>          patch = Patch.objects.create(
>>              msgid=msgid,
>>              project=project,
>> +            patch_project=project,
>>              name=name[:255],
>>              date=date,
>>              headers=headers,
>> diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
>> index 3baf2999a836..f8d23a388ac7 100644
>> --- a/patchwork/views/__init__.py
>> +++ b/patchwork/views/__init__.py
>> @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None,
>>              context['filters'].set_status(filterclass, setting)
>>  
>>      if patches is None:
>> -        patches = Patch.objects.filter(project=project)
>> +        patches = Patch.objects.filter(patch_project=project)
>>  
>>      # annotate with tag counts
>>      patches = patches.with_tag_counts(project)


More information about the Patchwork mailing list