[PATCH 10/11] Be sensible computing project patch counts

Mon Sep 10 12:56:16 AEST 2018

Stephen Finucane <stephen at that.guru> writes:
> On Fri, 2018-08-31 at 15:16 +0100, Stephen Finucane wrote:
>> On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote:
>> > Django actively fights constructing a query that isn't insane.
>> > 
>> > So, let's go and just execute a raw one. This is all very standard
>> > SQL so should execute everywhere without a problem.
>> > 
>> > With the dataset of patchwork.ozlabs.org, looking at the /project/
>> > page for qemu-devel would take 13 queries and 1500ms,
>> > with this patch it's down to 11 queries in ~250ms.
>> > For the dataset of the netdev list, it's down to 440ms from 1500ms.
>> > 
>> > Signed-off-by: Stewart Smith <stewart at linux.ibm.com>
>> 
>> I'm generally reluctant to dive into SQL unless it's absolutely
>> required. This is simple enough and the gains are significant so
>> 
>> Reviewed-by: Stephen Finucane <stephen at that.guru>
>
> Actually, there's a potential issue here which I spotted while testing
> with a smaller (odd, I know) set of patches. I can fix at merge time,
> assuming a respin isn't needed.
>
>> Is this a potential bug report for Django?
>> 
>> Stephen
>> 
>> > ---
>> >  patchwork/views/project.py | 23 ++++++++++++++++++++---
>> >  1 file changed, 20 insertions(+), 3 deletions(-)
>> > 
>> > diff --git a/patchwork/views/project.py b/patchwork/views/project.py
>> > index 484455c02d9d..2a75242a06af 100644
>> > --- a/patchwork/views/project.py
>> > +++ b/patchwork/views/project.py
>> > @@ -27,6 +27,8 @@ from patchwork.compat import reverse
>> >  from patchwork.models import Patch
>> >  from patchwork.models import Project
>> >  
>> > +from django.db import connection
>> > +
>> >  
>> >  def project_list(request):
>> >      projects = Project.objects.all()
>> > @@ -44,14 +46,29 @@ def project_list(request):
>> >  
>> >  def project_detail(request, project_id):
>> >      project = get_object_or_404(Project, linkname=project_id)
>> > -    patches = Patch.objects.filter(project=project)
>> > +
>> > +    # So, we revert to raw sql because if we do what you'd think would
>> > +    # be the correct thing in Django-ese, it ends up doing a *pointless*
>> > +    # join with patchwork_submissions that ends up ruining the query.
>> > +    # So, we do not do this, as this is wrong:
>> > +    #patches = Patch.objects.filter(patch_project_id=project.id).only('archived')
>> > +    #patches = patches.annotate(c=Count('archived'))
>> > +    # and instead do this, because it's simple and fast
>> > +
>> > +    n_patches = {}
>> > +    with connection.cursor() as cursor:
>> > +        c = cursor.execute('SELECT archived,COUNT(submission_ptr_id) as c FROM patchwork_patch WHERE patch_project_id=%s GROUP BY archived',
>> > +                           [project.id])
>> > +
>> > +        for r in cursor:
>> > +            n_patches[r[0]] = r[1]
>> >  
>> >      context = {
>> >          'project': project,
>> >          'maintainers': User.objects.filter(
>> >              profile__maintainer_projects=project),
>> > -        'n_patches': patches.filter(archived=False).count(),
>> > -        'n_archived_patches': patches.filter(archived=True).count(),
>> > +        'n_patches': n_patches[False],
>> > +        'n_archived_patches': n_patches[True],
>
> It's possible that there are (a) no patches, (b) all patches are
> archived, or (c) no patches are archived. If any of these are true,
> you'll get a key error. We can fix this very simply like so:
>
>   'n_patches': n_patches[False] if False in n_patches else 0,
>   'n_archived_patches': n_patches[True] if True in n_patches else 0,
>
> As noted above, I'll fix this at merge time and add a test to ensure we
> don't regress.

Oh, neat! Yeah, I likely didn't test those scenarios at all, I was
focused on large amounts of data rather than no data.

-- 
Stewart Smith
OPAL Architect, IBM.