DB-murdering API query (index suggestions needed)
konstantin at linuxfoundation.org
Sat Nov 16 04:43:16 AEDT 2019
On Sat, Nov 16, 2019 at 12:48:33AM +1100, Daniel Axtens wrote:
> > GET
> > /api/1.1/patches/?project=62&before=2019-11-01T00:00:00&per_page=100&page=6150
> > The query behind this takes about 1 minute to run on a 20-core HT Xeon
> > system and requires creating a huge temporary file (there are 18375
> > patches in that project).
> Ouch, I'm sorry to hear that.
Well, it's true that some of kernel.org's projects are large beyond what
would be considered "sane". :)
(Another annoyance of mine is when someone requests a project but
doesn't end up using it. We end up accumulating huge amounts of dead
data over the years with no clear way of determining "is this useful to
anyone?" Now that we have mailing list archives available in a
public-inbox format, it's easier to justify deleting projects that are
clearly not being used.)
> We have a bunch of db magic contributed by Stewart that will hit 2.2.
> Stewart, do you happen to know if any of your magic will affect API
> queries? They're advertised as affecting the general listing of patches
> in the UI, I'm not sure if they also affect this.
> If not, we can definitely have a look at getting an index or rate
> limiting/authentication thingy in for 2.2.
I'm not sure rate-limiting would have helped much in this particular
case, because whoever was doing it was not trying to be malicious. There
were only 2-3 queries running in parallel, but they were so heavy that
it was enough to affect DB performance. Per-second rate-limiting is also
easy enough to do on the upstream proxy, so I doubt it's worth the
effort to develop this feature in the code.
In fact, I can probably do the same for unauthenticated API access, but
it might be nicer if there's a simple toggle available in Patchwork's
UI. I wouldn't consider this a high-priority feature.
More information about the Patchwork