DB-murdering API query (index suggestions needed)
Daniel Axtens
dja at axtens.net
Tue Mar 17 15:17:00 AEDT 2020
Daniel Axtens <dja at axtens.net> writes:
> Daniel Axtens <dja at axtens.net> writes:
>
>>> Today, the DB behind patchwork.kernel.org was in a semi-permanent state
>>> of suffering due to someone trying to suck down all patches in the
>>> linux-arm-kernel project. This is what the API request looked like:
>>>
>>> GET /api/1.1/patches/?project=62&before=2019-11-01T00:00:00&per_page=100&page=6150
>>>
>>> The query behind this takes about 1 minute to run on a 20-core HT Xeon
>>> system and requires creating a huge temporary file (there are 18375
>>> patches in that project).
>>
>> Ouch, I'm sorry to hear that.
>>
>>>
>>> So, two questions, really:
>>>
>>> 1. Any indexes we can put in place to make this query perform better?
>>
>> We have a bunch of db magic contributed by Stewart that will hit 2.2.
>>
>> Stewart, do you happen to know if any of your magic will affect API
>> queries? They're advertised as affecting the general listing of patches
>> in the UI, I'm not sure if they also affect this.
>>
>> If not, we can definitely have a look at getting an index or rate
>> limiting/authentication thingy in for 2.2.
>>
>
> Trying this out, it looks like we haven't fixed this for 2.2.
Yep. Adding a project filter creates this amazingly awful set of
queries, including a subquery. It looks like Stewart set out to fix a
similar problem within a view, but we don't have the same flexibility in
DRF. I think the new models will solve this but I really don't want to
wait.
Regards,
Daniel
3. SELECT COUNT(*) FROM (SELECT DISTINCT `patchwork_submission`.`id` AS Col1, `patchwork_submission`.`msgid` AS Col2, `patchwork_submission`.`date` AS Col3, `patchwork_submission`.`submitter_id` AS Col4, `patchwork_submission`.`project_id` AS Col5, `patchwork_submission`.`name` AS Col6, `patchwork_patch`.`submission_ptr_id` AS Col7, `patchwork_patch`.`commit_ref` AS Col8, `patchwork_patch`.`pull_url` AS Col9, `patchwork_patch`.`delegate_id` AS Col10, `patchwork_patch`.`state_id` AS Col11, `patchwork_patch`.`archived` AS Col12, `patchwork_patch`.`hash` AS Col13, `patchwork_patch`.`patch_project_id` AS Col14, `patchwork_patch`.`series_id` AS Col15, `patchwork_patch`.`number` AS Col16, `patchwork_patch`.`related_id` AS Col17 FROM `patchwork_patch` INNER JOIN `patchwork_submission` ON (`patchwork_patch`.`submission_ptr_id` = `patchwork_submission`.`id`) WHERE `patchwork_submission`.`project_id` = 1) subquery
4. SELECT DISTINCT `patchwork_submission`.`id`, `patchwork_submission`.`msgid`, `patchwork_submission`.`date`, `patchwork_submission`.`submitter_id`, `patchwork_submission`.`project_id`, `patchwork_submission`.`name`, `patchwork_patch`.`submission_ptr_id`, `patchwork_patch`.`commit_ref`, `patchwork_patch`.`pull_url`, `patchwork_patch`.`delegate_id`, `patchwork_patch`.`state_id`, `patchwork_patch`.`archived`, `patchwork_patch`.`hash`, `patchwork_patch`.`patch_project_id`, `patchwork_patch`.`series_id`, `patchwork_patch`.`number`, `patchwork_patch`.`related_id`, `patchwork_person`.`id`, `patchwork_person`.`email`, `patchwork_person`.`name`, `patchwork_person`.`user_id`, `patchwork_project`.`id`, `patchwork_project`.`linkname`, `patchwork_project`.`name`, `patchwork_project`.`listid`, `patchwork_project`.`listemail`, `patchwork_project`.`subject_match`, `patchwork_project`.`web_url`, `patchwork_project`.`scm_url`, `patchwork_project`.`webscm_url`, `patchwork_project`.`list_archive_url`, `patchwork_project`.`list_archive_url_format`, `patchwork_project`.`commit_url_format`, `patchwork_project`.`send_notifications`, `patchwork_project`.`use_tags`, `auth_user`.`id`, `auth_user`.`password`, `auth_user`.`last_login`, `auth_user`.`is_superuser`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`date_joined`, `patchwork_state`.`id`, `patchwork_state`.`name`, `patchwork_state`.`slug`, `patchwork_state`.`ordering`, `patchwork_state`.`action_required`, `patchwork_series`.`id`, `patchwork_series`.`project_id`, `patchwork_series`.`cover_letter_id`, `patchwork_series`.`name`, `patchwork_series`.`date`, `patchwork_series`.`submitter_id`, `patchwork_series`.`version`, `patchwork_series`.`total`, T8.`id`, T8.`linkname`, T8.`name`, T8.`listid`, T8.`listemail`, T8.`subject_match`, T8.`web_url`, T8.`scm_url`, T8.`webscm_url`, T8.`list_archive_url`, T8.`list_archive_url_format`, T8.`commit_url_format`, T8.`send_notifications`, T8.`use_tags` FROM `patchwork_patch` INNER JOIN `patchwork_submission` ON (`patchwork_patch`.`submission_ptr_id` = `patchwork_submission`.`id`) INNER JOIN `patchwork_project` ON (`patchwork_submission`.`project_id` = `patchwork_project`.`id`) INNER JOIN `patchwork_person` ON (`patchwork_submission`.`submitter_id` = `patchwork_person`.`id`) LEFT OUTER JOIN `auth_user` ON (`patchwork_patch`.`delegate_id` = `auth_user`.`id`) LEFT OUTER JOIN `patchwork_state` ON (`patchwork_patch`.`state_id` = `patchwork_state`.`id`) LEFT OUTER JOIN `patchwork_series` ON (`patchwork_patch`.`series_id` = `patchwork_series`.`id`) LEFT OUTER JOIN `patchwork_project` T8 ON (`patchwork_series`.`project_id` = T8.`id`) WHERE `patchwork_submission`.`project_id` = 1 ORDER BY `patchwork_patch`.`submission_ptr_id` ASC LIMIT 1
>
> Regards,
> Daniel
>
>> Regards,
>> Daniel
>>
>>> 2. Is there a way to disable anonymous API access?
>>>
>>> -K
>>> _______________________________________________
>>> Patchwork mailing list
>>> Patchwork at lists.ozlabs.org
>>> https://lists.ozlabs.org/listinfo/patchwork
More information about the Patchwork
mailing list