[PATCH] Fix slow Patch counting query
Daniel Axtens
dja at axtens.net
Wed Mar 21 10:04:23 AEDT 2018
I have fixed up a small error where I didn't update a test and applied
this to master. I'm just waiting for Travis CI to run on my repository,
and then I will push it to the main repo.
As this is very important to OzLabs, I will spin a 2.1-rc1 soon.
Regards,
Daniel
Daniel Axtens <dja at axtens.net> writes:
> Stephen Rothwell noticed (way back in September - sorry Stephen!) that
> the following query is really slow on OzLabs:
>
> SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
> INNER JOIN "patchwork_submission" ON
> ("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id")
> WHERE ("patchwork_submission"."project_id" = 14 AND
> "patchwork_patch"."state_id" IN
> (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
> WHERE U0."action_required" = true
> ORDER BY U0."ordering" ASC));
>
> I think this is really slow because we have to join the patch and
> submission table to get the project id, which we need to filter the
> patches.
>
> Duplicate the project id in the patch table itself, which allows us to
> avoid the JOIN.
>
> The new query reads as:
> SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
> WHERE ("patchwork_patch"."patch_project_id" = 1 AND
> "patchwork_patch"."state_id" IN
> (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
> WHERE U0."action_required" = true
> ORDER BY U0."ordering" ASC));
>
> Very simple testing on a small, artifical Postgres instance (3
> projects, 102711 patches), shows speed gains of ~1.5-5x for this
> query. Looking at Postgres' cost estimates (EXPLAIN) of the first
> query vs the second query, we see a ~1.75x improvement there too.
>
> I suspect the gains will be bigger on OzLabs.
>
> (It turns out all of this is all for the "| NN patches" counter we
> added to the filter bar!!)
>
> Reported-by: Stephen Rothwell <sfr at canb.auug.org.au>
> Signed-off-by: Daniel Axtens <dja at axtens.net>
>
> ---
>
> This requires a migration, so I don't think we can feasibly do it as a
> stable update.
>
> I think we drop the patch counter for stable and try to get this and
> the event stuff merged to master promptly, and just tag 2.1. (To that
> end, I will re-read and finish reviewing the event stuff soon.)
> ---
> patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++
> patchwork/models.py | 4 +++
> patchwork/parser.py | 1 +
> patchwork/views/__init__.py | 2 +-
> 4 files changed, 45 insertions(+), 1 deletion(-)
> create mode 100644 patchwork/migrations/0024_patch_patch_project.py
>
> diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py
> new file mode 100644
> index 000000000000..76d8f144c9dd
> --- /dev/null
> +++ b/patchwork/migrations/0024_patch_patch_project.py
> @@ -0,0 +1,39 @@
> +# -*- coding: utf-8 -*-
> +# Generated by Django 1.11.10 on 2018-03-08 01:51
> +from __future__ import unicode_literals
> +
> +from django.db import migrations, models
> +import django.db.models.deletion
> +
> +
> +class Migration(migrations.Migration):
> + # per migration 16, but note this seems to be going away
> + # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113)
> + atomic = False
> +
> + dependencies = [
> + ('patchwork', '0023_timezone_unify'),
> + ]
> +
> + operations = [
> + migrations.AddField(
> + model_name='patch',
> + name='patch_project',
> + field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
> + preserve_default=False,
> + ),
> +
> + # as with 10, this will break if you use non-default table names
> + migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id =
> + (SELECT project_id FROM patchwork_submission
> + WHERE patchwork_submission.id =
> + patchwork_patch.submission_ptr_id);'''
> + ),
> +
> + migrations.AlterField(
> + model_name='patch',
> + name='patch_project',
> + field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
> + ),
> +
> + ]
> diff --git a/patchwork/models.py b/patchwork/models.py
> index b2491752f04a..3b905c4cd75b 100644
> --- a/patchwork/models.py
> +++ b/patchwork/models.py
> @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission):
> archived = models.BooleanField(default=False)
> hash = HashField(null=True, blank=True)
>
> + # duplicate project from submission in subclass so we can count the
> + # patches in a project without needing to do a JOIN.
> + patch_project = models.ForeignKey(Project, on_delete=models.CASCADE)
> +
> objects = PatchManager()
>
> @staticmethod
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 803e98592fa8..805037c72d73 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None):
> patch = Patch.objects.create(
> msgid=msgid,
> project=project,
> + patch_project=project,
> name=name[:255],
> date=date,
> headers=headers,
> diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
> index 3baf2999a836..f8d23a388ac7 100644
> --- a/patchwork/views/__init__.py
> +++ b/patchwork/views/__init__.py
> @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None,
> context['filters'].set_status(filterclass, setting)
>
> if patches is None:
> - patches = Patch.objects.filter(project=project)
> + patches = Patch.objects.filter(patch_project=project)
>
> # annotate with tag counts
> patches = patches.with_tag_counts(project)
> --
> 2.14.1
More information about the Patchwork
mailing list