0039_unique_series_references migration needs optimization

Wed Oct 14 04:54:52 AEDT 2020

On Tue, 13 Oct 2020 at 13:15, Konstantin Ryabitsev
<konstantin at linuxfoundation.org> wrote:
> I may be venting some frustration, but I'm now 4 hours into a
> 2.1 -> 2.2 upgrade. The following query from 0039 has now been running
> over 3 hours:
>
>       COMMAND: Query
>          TIME: 10921
>         STATE: Sending data
>          INFO: DELETE a FROM patchwork_seriesreference a
>             INNER JOIN patchwork_seriesreference b
>             WHERE
>               a.id < b.id
>               AND a.project_id = b.project_id
>               AND a.msgid = b.msgid
>
> That table has 220,000 rows, but it still seems strange that it's taking
> so long to execute. As there is no way to know if it's about to be
> finished or will take another few hours to run, I'm not willing to kill
> it yet.
>
> Would adding an index across (id, project_id, msgid) make this query
> faster to run?

I can confirm that adding a temporary index on (project_id, msgid) and
then rerunning the above query makes it complete in less than 2
seconds. So, the migration should do the following:

add the project field to the model
create temporary non-unique index across msgid, project_id
run the duplicate-removal query above
delete temporary index
create the unique index

-K

-- 
Konstantin Ryabitsev
Director, IT Core Projects
The Linux Foundation
Montréal, Québec