[PATCH] Fix slow Patch counting query

Stephen Finucane stephen at that.guru
Wed Mar 28 22:38:32 AEDT 2018


On Wed, 2018-03-28 at 09:42 +1100, Daniel Axtens wrote:
> Stephen Finucane <stephen at that.guru> writes:
> 
> > On Fri, 2018-03-09 at 02:54 +1100, Daniel Axtens wrote:
> > > Stephen Rothwell noticed (way back in September - sorry Stephen!) that
> > > the following query is really slow on OzLabs:
> > > 
> > > SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
> > >     INNER JOIN "patchwork_submission" ON
> > >         ("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id")
> > >     WHERE ("patchwork_submission"."project_id" = 14 AND
> > >            "patchwork_patch"."state_id" IN
> > > 	       (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
> > >                 WHERE U0."action_required" = true
> > > 		ORDER BY U0."ordering" ASC));
> > > 
> > > I think this is really slow because we have to join the patch and
> > > submission table to get the project id, which we need to filter the
> > > patches.
> > > 
> > > Duplicate the project id in the patch table itself, which allows us to
> > > avoid the JOIN.
> > > 
> > > The new query reads as:
> > > SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
> > >     WHERE ("patchwork_patch"."patch_project_id" = 1 AND
> > >            "patchwork_patch"."state_id" IN
> > > 	       (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
> > > 	        WHERE U0."action_required" = true
> > > 		ORDER BY U0."ordering" ASC));
> > > 
> > > Very simple testing on a small, artifical Postgres instance (3
> > > projects, 102711 patches), shows speed gains of ~1.5-5x for this
> > > query. Looking at Postgres' cost estimates (EXPLAIN) of the first
> > > query vs the second query, we see a ~1.75x improvement there too.
> > > 
> > > I suspect the gains will be bigger on OzLabs.
> > > 
> > > (It turns out all of this is all for the "| NN patches" counter we
> > > added to the filter bar!!)
> > > 
> > > Reported-by: Stephen Rothwell <sfr at canb.auug.org.au>
> > > Signed-off-by: Daniel Axtens <dja at axtens.net>
> > 
> > It's unfortunate that this has already merged. While it works, it
> > defeats the whole point of the multi-table inheritance introduced in
> > commit 86172ccc1 (normalization). To be honest, given the performance
> > impacts that particular change introduced (which we're only seeing at
> > scale), I'd rather denormalize the whole thing and fold the 'Patch' and
> > 'CoverLetter' models back into 'Submission' and just use a 'type' field
> > (or similar) to control behavior. Is there any reason not to do this?
> 
> I agree that would be a better conceptual solution. The reason I didn't
> do it is because I tried a couple of times and couldn't get it all to
> work, and I haven't had the time to really sit down and re-engineer it.
> 
> If you can get it to work I am happy to revert this and apply that; the
> changes to the database schema aren't at all difficult to undo so any
> brave souls running master won't be completely hosed.

OK, that's fair. Let me see how I fare with this. The biggest issues I
see are the custom managers that we use for the Patch class. I'm hoping
Veronika's tag infrastructure rework will help with this and I can
build upon that. I'll review said RFC this week to see if that's the
case.

Stephen

> Regards,
> Daniel
> 
> > If not, I'd like to wait on 2.1 until we've done both this and the
> > event fixes.
> > 
> > Stephen
> > 
> > > ---
> > > 
> > > This requires a migration, so I don't think we can feasibly do it as a
> > > stable update.
> > > 
> > > I think we drop the patch counter for stable and try to get this and
> > > the event stuff merged to master promptly, and just tag 2.1. (To that
> > > end, I will re-read and finish reviewing the event stuff soon.)
> > > ---
> > >  patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++
> > >  patchwork/models.py                              |  4 +++
> > >  patchwork/parser.py                              |  1 +
> > >  patchwork/views/__init__.py                      |  2 +-
> > >  4 files changed, 45 insertions(+), 1 deletion(-)
> > >  create mode 100644 patchwork/migrations/0024_patch_patch_project.py
> > > 
> > > diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py
> > > new file mode 100644
> > > index 000000000000..76d8f144c9dd
> > > --- /dev/null
> > > +++ b/patchwork/migrations/0024_patch_patch_project.py
> > > @@ -0,0 +1,39 @@
> > > +# -*- coding: utf-8 -*-
> > > +# Generated by Django 1.11.10 on 2018-03-08 01:51
> > > +from __future__ import unicode_literals
> > > +
> > > +from django.db import migrations, models
> > > +import django.db.models.deletion
> > > +
> > > +
> > > +class Migration(migrations.Migration):
> > > +    # per migration 16, but note this seems to be going away
> > > +    # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113)
> > > +    atomic = False
> > > +
> > > +    dependencies = [
> > > +        ('patchwork', '0023_timezone_unify'),
> > > +    ]
> > > +
> > > +    operations = [
> > > +        migrations.AddField(
> > > +            model_name='patch',
> > > +            name='patch_project',
> > > +            field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
> > > +            preserve_default=False,
> > > +        ),
> > > +
> > > +        # as with 10, this will break if you use non-default table names
> > > +        migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id =
> > > +                               (SELECT project_id FROM patchwork_submission
> > > +                                WHERE patchwork_submission.id =
> > > +                                        patchwork_patch.submission_ptr_id);'''
> > > +        ),
> > > +
> > > +        migrations.AlterField(
> > > +            model_name='patch',
> > > +            name='patch_project',
> > > +            field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
> > > +        ),
> > > +
> > > +    ]
> > > diff --git a/patchwork/models.py b/patchwork/models.py
> > > index b2491752f04a..3b905c4cd75b 100644
> > > --- a/patchwork/models.py
> > > +++ b/patchwork/models.py
> > > @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission):
> > >      archived = models.BooleanField(default=False)
> > >      hash = HashField(null=True, blank=True)
> > >  
> > > +    # duplicate project from submission in subclass so we can count the
> > > +    # patches in a project without needing to do a JOIN.
> > > +    patch_project = models.ForeignKey(Project, on_delete=models.CASCADE)
> > > +
> > >      objects = PatchManager()
> > >  
> > >      @staticmethod
> > > diff --git a/patchwork/parser.py b/patchwork/parser.py
> > > index 803e98592fa8..805037c72d73 100644
> > > --- a/patchwork/parser.py
> > > +++ b/patchwork/parser.py
> > > @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None):
> > >          patch = Patch.objects.create(
> > >              msgid=msgid,
> > >              project=project,
> > > +            patch_project=project,
> > >              name=name[:255],
> > >              date=date,
> > >              headers=headers,
> > > diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
> > > index 3baf2999a836..f8d23a388ac7 100644
> > > --- a/patchwork/views/__init__.py
> > > +++ b/patchwork/views/__init__.py
> > > @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None,
> > >              context['filters'].set_status(filterclass, setting)
> > >  
> > >      if patches is None:
> > > -        patches = Patch.objects.filter(project=project)
> > > +        patches = Patch.objects.filter(patch_project=project)
> > >  
> > >      # annotate with tag counts
> > >      patches = patches.with_tag_counts(project)



More information about the Patchwork mailing list