[PATCH 2/3] migrations: Don't fail if it's not possible to rehome a patch

Stephen Finucane stephen at that.guru
Thu Mar 5 21:03:55 AEDT 2020


On Thu, 2020-03-05 at 11:44 +1100, Daniel Axtens wrote:
> Aaah this is terrifying. I now wish I had reviewed this code much more
> closely, and I am now super nervous about releasing this on the world.

Eh, it's not really. We're not touching patches themselves, only series
metadata, so if we screw up slightly so be it. Also, on PostgreSQL a
failing migration like this will trigger a rollback iirc. It's just an
additional corner case I failed to forsee ahead of time :(

> Stephen Finucane <stephen at that.guru> writes:
> 
> > Migration 0039 attempts to move patches that have ended up in an
> > arbitrary series due to race conditions into the correct series.
> > However, because we weren't previously considering versions when
> > choosing a series, this might not be possible resulting in the issues
> > seen in #340 [1]. If this happens, we could try rehome _those_ patches
> > but that's really complicated and requires bringing in a whole load of
> > parsing functionality from 'patchwork.parser' so we can e.g. rebuild the
> > list of references from the headers. Instead, print a big warning to the
> > admin with information about how they could fix the issue and then just
> > remove the offending references. This means we're still left with
> > orphaned series but these could be fixed manually, if necessary.
> > 
> > [1] https://github.com/getpatchwork/patchwork/issues/340
> > 
> > Signed-off-by: Stephen Finucane <stephen at that.guru>
> > Closes: #340
> > ---
> >  .../0039_unique_series_references.py          | 35 +++++++++++++++++--
> >  1 file changed, 32 insertions(+), 3 deletions(-)
> > 
> > diff --git a/patchwork/migrations/0039_unique_series_references.py b/patchwork/migrations/0039_unique_series_references.py
> > index 99b10fcc..c8ac0dc6 100644
> > --- a/patchwork/migrations/0039_unique_series_references.py
> > +++ b/patchwork/migrations/0039_unique_series_references.py
> > @@ -1,6 +1,8 @@
> >  from django.db import connection, migrations, models
> >  from django.db.models import Count
> > +from django.db.utils import IntegrityError
> >  import django.db.models.deletion
> > +from django.utils import six
> >  
> >  
> >  def merge_duplicate_series(apps, schema_editor):
> > @@ -49,14 +51,41 @@ def merge_duplicate_series(apps, schema_editor):
> >              if series_ref == chosen_ref:
> >                  continue
> >  
> > +            has_conflict = False
> > +
> >              # update the patches to point to our chosen series instead, on the
> >              # assumption that all other metadata is correct
> >              for patch in Patch.objects.filter(series=series_ref.series):
> >                  patch.series = chosen_ref.series
> > -                patch.save()
> > +                try:
> > +                    patch.save()
> > +                except IntegrityError as exc:
> > +                    has_conflict = True
> > +                    print(
> > +                        "We attempted to merge patch '%d' into the correct "
> > +                        "series, '%d', but it appears there is already a "
> > +                        "patch with the same number in this series. We have "
> > +                        "deleted the series references for the bad series, "
> > +                        "'%d', but you may wish to do further manual work to "
> > +                        "resolve this issue. For more information, refer to "
> > +                        "https://git.io/JvVvV. Error: %s" % (
> 
> Should we use the full canonical url rather than a link shortener that
> might die at some point?

We could. Hardly seems worth it though, since this is GitHub's official
shortner and by time that dies, it's likely GitHub will have gone with
it (or at least no longer be relevant). Also, the migration will be
gone (I'm hoping to do a big squash of migrations for 3.0).

> Also, given that this is potentially destructive, would we be better to
> just leave old data alone? Or at least give admins the option to leave
> old data alone? The value of correct series information ages out pretty
> quickly...

We can't, at least not entirely. We need to get rid of any series
reference that shares the same project and message ID since we want to
enforce a uniqueness constraint on these fields. We could just delete
all offending series references, since they're only used when building
series and not afterwards, but the above does work in the general case
(as tested by your parallel parsing patch) and we now have a fallback
to handle things we can't easily auto-resolve so I'd like to continue
with this approach.

Stephen

> Regards,
> Daniel
> 
> > +                            patch.id,
> > +                            chosen_ref.series.id,
> > +                            series_ref.series.id,
> > +                            six.text_type(exc),
> > +                        )
> > +                    )
> >  
> > -            # delete the other series (which will delete the series ref)
> > -            series_ref.series.delete()
> > +            if not has_conflict:
> > +                # assuming there has been no conflict and all patches have been
> > +                # moved to a new series, delete the other series (which will
> > +                # delete the series ref)
> > +                series_ref.series.delete()
> > +            else:
> > +                # otherwise just delete the series references and keep the old
> > +                # series for later cleanup
> > +                for series_ref in series_ref.series.references:
> > +                    series_ref.delete()
> >  
> >  
> >  def copy_project_field(apps, schema_editor):
> > -- 
> > 2.24.1
> > 
> > _______________________________________________
> > Patchwork mailing list
> > Patchwork at lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/patchwork



More information about the Patchwork mailing list