[PATCH 01/11] Improve patch listing performance (~3x)

Wed Aug 29 23:36:32 AEST 2018

On Sat, 2018-08-11 at 04:28 +1000, Daniel Axtens wrote:
> Stewart Smith <stewart at linux.ibm.com> writes:
> 
> > There's two main bits that are really expensive when composing the list
> > of patches for a project: the query getting the list, and the query
> > finding the series for each patch.
> > 
> > If we look at the query getting the list, it gets a lot of unnesseccary
> > fields such as 'headers' and 'content', even though we tell Django not
> > to. It turns out that Django seems to ignore the Submission relationship
> > and I have no idea how to force it to ignore that thing (defer doesn't
> > work) but if we go only, then it works okay.
> > 
> > From my import of ~8000 messages for a few projects, my laptop query
> > time (MySQL, as setup by whatever the docker-compose things do) goes
> > from:
> > 
> > http://localhost:8000/project/linux-kernel/list/
> > FROM:
> > 342ms SQL queries cold cache, 268ms warm cache
> > TO:
> > 118ms SQL queries cold cache, 88ms warm cache
> > 
> > Which is... non trivial to say the least.
> > 
> > The big jump is the patches.only change, and the removal of ordering
> > on the patchseries takes a further 10ms off. For some strange reason, it
> > seems rather hard to tell Django that you don't care what order the
> > results come back in for that query (if we do, then the db server has to
> > do a sort rather than just return each row)
> 
> Thanks Stewart! It's great to get some real DB experience - feel free to
> hang around! :)

+1000

> So, further to our conversation with Konstantin, I tested this against
> Django 2.0. It still saves us some time - it means we no longer load the
> following fields:
> 
> `patchwork_submission`.`id`, `patchwork_submission`.`msgid`,
> `patchwork_patch`.`commit_ref`, `patchwork_patch`.`pull_url`,
> `patchwork_patch`.`archived`, `patchwork_patch`.`hash`,
> `patchwork_patch`.`patch_project_id`, 

Out of curiosity, does this allow us to drop the patch_project_id field
entirely or not (bear in mind, I haven't read through the rest of
this).

> This obviously saves the db some work and communication overhead.
> 
> I'm a little nervous that this will slightly complicate some of the
> further denormalisation but I think that's probably a price worth paying
> unless Stephen objects.

To be honest, it could be 6 months before we get any further on big
efforts like this. A 3x performance boost right now beats an unknown
boost N months down the line, IMO. My 2c anyway.

> I do still want to test the 'ordering' change a bit more before
> committing it though.

I'm going to attempt to do my own testing of this (with a much larger
dataset) some evening this week. I'll hold off applying anything
pending this though.

Stephen