[PATCH 01/11] Improve patch listing performance (~3x)

Fri Aug 31 14:57:24 AEST 2018

Stephen Finucane <stephen at that.guru> writes:

> On Sat, 2018-08-11 at 04:28 +1000, Daniel Axtens wrote:
>> Stewart Smith <stewart at linux.ibm.com> writes:
>> 
>> > There's two main bits that are really expensive when composing the list
>> > of patches for a project: the query getting the list, and the query
>> > finding the series for each patch.
>> > 
>> > If we look at the query getting the list, it gets a lot of unnesseccary
>> > fields such as 'headers' and 'content', even though we tell Django not
>> > to. It turns out that Django seems to ignore the Submission relationship
>> > and I have no idea how to force it to ignore that thing (defer doesn't
>> > work) but if we go only, then it works okay.
>> > 
>> > From my import of ~8000 messages for a few projects, my laptop query
>> > time (MySQL, as setup by whatever the docker-compose things do) goes
>> > from:
>> > 
>> > http://localhost:8000/project/linux-kernel/list/
>> > FROM:
>> > 342ms SQL queries cold cache, 268ms warm cache
>> > TO:
>> > 118ms SQL queries cold cache, 88ms warm cache
>> > 
>> > Which is... non trivial to say the least.
>> > 
>> > The big jump is the patches.only change, and the removal of ordering
>> > on the patchseries takes a further 10ms off. For some strange reason, it
>> > seems rather hard to tell Django that you don't care what order the
>> > results come back in for that query (if we do, then the db server has to
>> > do a sort rather than just return each row)
>> 
>> Thanks Stewart! It's great to get some real DB experience - feel free to
>> hang around! :)
>
> +1000
>
>> So, further to our conversation with Konstantin, I tested this against
>> Django 2.0. It still saves us some time - it means we no longer load the
>> following fields:
>> 
>> `patchwork_submission`.`id`, `patchwork_submission`.`msgid`,
>> `patchwork_patch`.`commit_ref`, `patchwork_patch`.`pull_url`,
>> `patchwork_patch`.`archived`, `patchwork_patch`.`hash`,
>> `patchwork_patch`.`patch_project_id`, 
>
> Out of curiosity, does this allow us to drop the patch_project_id field
> entirely or not (bear in mind, I haven't read through the rest of
> this).
>
>> This obviously saves the db some work and communication overhead.
>> 
>> I'm a little nervous that this will slightly complicate some of the
>> further denormalisation but I think that's probably a price worth paying
>> unless Stephen objects.
>
> To be honest, it could be 6 months before we get any further on big
> efforts like this. A 3x performance boost right now beats an unknown
> boost N months down the line, IMO. My 2c anyway.

OK then, reaffirming my in-principle ack to merging these once Stephen
or I can test the things they touch.

Regards,
Daniel 

>
>> I do still want to test the 'ordering' change a bit more before
>> committing it though.
>
> I'm going to attempt to do my own testing of this (with a much larger
> dataset) some evening this week. I'll hold off applying anything
> pending this though.
>
> Stephen