[PATCH] Fix slow Patch counting query
Stephen Finucane
stephen at that.guru
Mon Mar 26 02:03:33 AEDT 2018
On Fri, 2018-03-09 at 02:54 +1100, Daniel Axtens wrote:
> Stephen Rothwell noticed (way back in September - sorry Stephen!) that
> the following query is really slow on OzLabs:
>
> SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
> INNER JOIN "patchwork_submission" ON
> ("patchwork_patch"."submission_ptr_id" = "patchwork_submission"."id")
> WHERE ("patchwork_submission"."project_id" = 14 AND
> "patchwork_patch"."state_id" IN
> (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
> WHERE U0."action_required" = true
> ORDER BY U0."ordering" ASC));
>
> I think this is really slow because we have to join the patch and
> submission table to get the project id, which we need to filter the
> patches.
>
> Duplicate the project id in the patch table itself, which allows us to
> avoid the JOIN.
>
> The new query reads as:
> SELECT COUNT(*) AS "__count" FROM "patchwork_patch"
> WHERE ("patchwork_patch"."patch_project_id" = 1 AND
> "patchwork_patch"."state_id" IN
> (SELECT U0."id" AS Col1 FROM "patchwork_state" U0
> WHERE U0."action_required" = true
> ORDER BY U0."ordering" ASC));
>
> Very simple testing on a small, artifical Postgres instance (3
> projects, 102711 patches), shows speed gains of ~1.5-5x for this
> query. Looking at Postgres' cost estimates (EXPLAIN) of the first
> query vs the second query, we see a ~1.75x improvement there too.
>
> I suspect the gains will be bigger on OzLabs.
>
> (It turns out all of this is all for the "| NN patches" counter we
> added to the filter bar!!)
>
> Reported-by: Stephen Rothwell <sfr at canb.auug.org.au>
> Signed-off-by: Daniel Axtens <dja at axtens.net>
It's unfortunate that this has already merged. While it works, it
defeats the whole point of the multi-table inheritance introduced in
commit 86172ccc1 (normalization). To be honest, given the performance
impacts that particular change introduced (which we're only seeing at
scale), I'd rather denormalize the whole thing and fold the 'Patch' and
'CoverLetter' models back into 'Submission' and just use a 'type' field
(or similar) to control behavior. Is there any reason not to do this?
If not, I'd like to wait on 2.1 until we've done both this and the
event fixes.
Stephen
> ---
>
> This requires a migration, so I don't think we can feasibly do it as a
> stable update.
>
> I think we drop the patch counter for stable and try to get this and
> the event stuff merged to master promptly, and just tag 2.1. (To that
> end, I will re-read and finish reviewing the event stuff soon.)
> ---
> patchwork/migrations/0024_patch_patch_project.py | 39 ++++++++++++++++++++++++
> patchwork/models.py | 4 +++
> patchwork/parser.py | 1 +
> patchwork/views/__init__.py | 2 +-
> 4 files changed, 45 insertions(+), 1 deletion(-)
> create mode 100644 patchwork/migrations/0024_patch_patch_project.py
>
> diff --git a/patchwork/migrations/0024_patch_patch_project.py b/patchwork/migrations/0024_patch_patch_project.py
> new file mode 100644
> index 000000000000..76d8f144c9dd
> --- /dev/null
> +++ b/patchwork/migrations/0024_patch_patch_project.py
> @@ -0,0 +1,39 @@
> +# -*- coding: utf-8 -*-
> +# Generated by Django 1.11.10 on 2018-03-08 01:51
> +from __future__ import unicode_literals
> +
> +from django.db import migrations, models
> +import django.db.models.deletion
> +
> +
> +class Migration(migrations.Migration):
> + # per migration 16, but note this seems to be going away
> + # in new PostgreSQLs (https://stackoverflow.com/questions/12838111/south-cannot-alter-table-because-it-has-pending-trigger-events#comment44629663_12838113)
> + atomic = False
> +
> + dependencies = [
> + ('patchwork', '0023_timezone_unify'),
> + ]
> +
> + operations = [
> + migrations.AddField(
> + model_name='patch',
> + name='patch_project',
> + field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
> + preserve_default=False,
> + ),
> +
> + # as with 10, this will break if you use non-default table names
> + migrations.RunSQL('''UPDATE patchwork_patch SET patch_project_id =
> + (SELECT project_id FROM patchwork_submission
> + WHERE patchwork_submission.id =
> + patchwork_patch.submission_ptr_id);'''
> + ),
> +
> + migrations.AlterField(
> + model_name='patch',
> + name='patch_project',
> + field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='patchwork.Project'),
> + ),
> +
> + ]
> diff --git a/patchwork/models.py b/patchwork/models.py
> index b2491752f04a..3b905c4cd75b 100644
> --- a/patchwork/models.py
> +++ b/patchwork/models.py
> @@ -423,6 +423,10 @@ class Patch(SeriesMixin, Submission):
> archived = models.BooleanField(default=False)
> hash = HashField(null=True, blank=True)
>
> + # duplicate project from submission in subclass so we can count the
> + # patches in a project without needing to do a JOIN.
> + patch_project = models.ForeignKey(Project, on_delete=models.CASCADE)
> +
> objects = PatchManager()
>
> @staticmethod
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 803e98592fa8..805037c72d73 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -1004,6 +1004,7 @@ def parse_mail(mail, list_id=None):
> patch = Patch.objects.create(
> msgid=msgid,
> project=project,
> + patch_project=project,
> name=name[:255],
> date=date,
> headers=headers,
> diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
> index 3baf2999a836..f8d23a388ac7 100644
> --- a/patchwork/views/__init__.py
> +++ b/patchwork/views/__init__.py
> @@ -270,7 +270,7 @@ def generic_list(request, project, view, view_args=None, filter_settings=None,
> context['filters'].set_status(filterclass, setting)
>
> if patches is None:
> - patches = Patch.objects.filter(project=project)
> + patches = Patch.objects.filter(patch_project=project)
>
> # annotate with tag counts
> patches = patches.with_tag_counts(project)
More information about the Patchwork
mailing list