0039_unique_series_references migration needs optimization
Konstantin Ryabitsev
konstantin at linuxfoundation.org
Wed Oct 14 04:54:52 AEDT 2020
On Tue, 13 Oct 2020 at 13:15, Konstantin Ryabitsev
<konstantin at linuxfoundation.org> wrote:
> I may be venting some frustration, but I'm now 4 hours into a
> 2.1 -> 2.2 upgrade. The following query from 0039 has now been running
> over 3 hours:
>
> COMMAND: Query
> TIME: 10921
> STATE: Sending data
> INFO: DELETE a FROM patchwork_seriesreference a
> INNER JOIN patchwork_seriesreference b
> WHERE
> a.id < b.id
> AND a.project_id = b.project_id
> AND a.msgid = b.msgid
>
> That table has 220,000 rows, but it still seems strange that it's taking
> so long to execute. As there is no way to know if it's about to be
> finished or will take another few hours to run, I'm not willing to kill
> it yet.
>
> Would adding an index across (id, project_id, msgid) make this query
> faster to run?
I can confirm that adding a temporary index on (project_id, msgid) and
then rerunning the above query makes it complete in less than 2
seconds. So, the migration should do the following:
add the project field to the model
create temporary non-unique index across msgid, project_id
run the duplicate-removal query above
delete temporary index
create the unique index
-K
--
Konstantin Ryabitsev
Director, IT Core Projects
The Linux Foundation
Montréal, Québec
More information about the Patchwork
mailing list