[PATCH v2 9/9] parser: don't fail on multiple SeriesReferences

Daniel Axtens dja at axtens.net
Sat Mar 3 14:30:37 AEDT 2018


Stephen Finucane <stephen at that.guru> writes:

> On Mon, 2018-02-26 at 15:55 +1100, Daniel Axtens wrote:
>> Stephen Finucane <stephen at that.guru> writes:
>> 
>> > On Sun, 2018-02-25 at 01:50 +1100, Daniel Axtens wrote:
>> > > Parallel parsing would occasonally fail with:
>> > > 
>> > > patchwork.models.MultipleObjectsReturned: get() returned more than one SeriesReference -- it returned 2!
>> > > 
>> > > I think these are happening if you have different processes parsing
>> > > e.g. 1/3 and 2/3 simultaneously: both will have a reference to 1/3,
>> > > in the case of 1 it will be the msgid, in the case of 2 it will be
>> > > in References. So when we come to parse 3/3, .get() finds 2 and
>> > > throws the exception.
>> > > 
>> > > This does not fix the creation of multiple series references; it
>> > > just causes them to be ignored. We still have serious race conditions
>> > > with series creation, but I don't yet have clear answers for them.
>> > > With this patch, they will at least not stop patches from being
>> > > processed - they'll just lead to wonky series, which we already have.
>> > > 
>> > > Reviewed-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
>> > > Signed-off-by: Daniel Axtens <dja at axtens.net>
>> > 
>> > Correct me if I'm wrong, but this seems to highlight two issues:
>> > 
>> >  * The race condition between searching for a series reference and
>> >    using it
>> >  * The fact we expect a series to be unique for a given msgid and
>> >    project, yet the 'unique_together' is for a given msgid and
>> >    *series*.
>> > 
>> > The first of these is caught here (yet not fixed, as it's a non-trivial 
>> > thing to resolve). The second should still be fixed though. Should we
>> > do this here, as part of this series?
>> 
>> My thinking is no, for these reasons:
>> 
>>  - I want a patch I can backport to stable without requiring a database
>>    migration, because:
>> 
>>     * that seems like a very unusual thing for a stable update
>> 
>>     * I don't trust my ability to get it right for a stable release
>> 
>>     * relatedly, it will require a lot of testing for me to trust it,
>>       and I want something sooner rather than later so that the
>>       Buildroot people can get on with their lives.
>> 
>>     * The different numbering of migrations will make the different
>>       upgrade paths a nightmare to describe, yet alone test
>
> Yup, I had no considered the backportability aspects of this. That's a
> very valid point. Let's do this separately.
>
>>  - I have a long-term ongoing effort to move 'project' up into the
>>    various tables that use it so as to avoid JOINs and this would be a
>>    good thing to do in that effort. (It's the next big push on my list,
>>    it's just a nightmare to get the compatibility and migration code
>>    going.)
>
> We should probably discuss this separately, but is this a significant
> part of the issue? Far as I could see, there were two different issues
> with the DB causing our pain:
>
>  * The number of JOINs on Event, which I have an RFC waiting on reviews
>    for.
>  * The need to JOIN for Patch and CoverLetter, which could probably be
>    resolved by flattening this back down into Submission.
>
> Have I missed something?

I think for Patch and CoverLetter, we need to pull project out of the
submission table and in to patch/coverletter tables. This is my top
priority after merging the various patches that people post, but it's
surprisingly tricky. If we put it in both Submission and Patch, we need
to migrate the data and then keep the duplicate data in sync. If we tear
it out of submission entirely, we end up doing one big (very big!)
migration and I haven't been able to figure that out yet.

For this issue, we also need to pull project into seriesreference,
either in addition to or instead of in Series. I guess that's at least
isolated from the change to submission but I haven't tried yet.

Regards,
Daniel


> Stephen
>
>> Regards,
>> Daniel
>> 
>> > 
>> > Stephen
>> > 
>> > > ---
>> > >  patchwork/parser.py | 23 +++++++++++++++++++++--
>> > >  1 file changed, 21 insertions(+), 2 deletions(-)
>> > > 
>> > > diff --git a/patchwork/parser.py b/patchwork/parser.py
>> > > index 56dc7006c811..5a7344cee93c 100644
>> > > --- a/patchwork/parser.py
>> > > +++ b/patchwork/parser.py
>> > > @@ -240,6 +240,13 @@ def _find_series_by_references(project, mail):
>> > >                  msgid=ref[:255], series__project=project).series
>> > >          except SeriesReference.DoesNotExist:
>> > >              continue
>> > > +        except SeriesReference.MultipleObjectsReturned:
>> > > +            # FIXME: Open bug: this can happen when we're processing
>> > > +            # messages in parallel. Pick the first and log.
>> > > +            logger.error("Multiple SeriesReferences for %s in project %s!" %
>> > > +                         (ref[:255], project.name))
>> > > +            return SeriesReference.objects.filter(
>> > > +                msgid=ref[:255], series__project=project).first().series
>> > >  
>> > >  
>> > >  def _find_series_by_markers(project, mail, author):
>> > > @@ -1037,6 +1044,9 @@ def parse_mail(mail, list_id=None):
>> > >                                                  series__project=project)
>> > >                  except SeriesReference.DoesNotExist:
>> > >                      SeriesReference.objects.create(series=series, msgid=ref)
>> > > +                except SeriesReference.MultipleObjectsReturned:
>> > > +                    logger.error("Multiple SeriesReferences for %s"
>> > > +                                 " in project %s!" % (ref, project.name))
>> > >  
>> > >          # add to a series if we have found one, and we have a numbered
>> > >          # patch. Don't add unnumbered patches (for example diffs sent
>> > > @@ -1075,6 +1085,11 @@ def parse_mail(mail, list_id=None):
>> > >                      msgid=msgid, series__project=project).series
>> > >              except SeriesReference.DoesNotExist:
>> > >                  series = None
>> > > +            except SeriesReference.MultipleObjectsReturned:
>> > > +                logger.error("Multiple SeriesReferences for %s"
>> > > +                             " in project %s!" % (msgid, project.name))
>> > > +                series = SeriesReference.objects.filter(
>> > > +                    msgid=msgid, series__project=project).first().series
>> > >  
>> > >              if not series:
>> > >                  series = Series(project=project,
>> > > @@ -1087,8 +1102,12 @@ def parse_mail(mail, list_id=None):
>> > >                  # we don't save the in-reply-to or references fields
>> > >                  # for a cover letter, as they can't refer to the same
>> > >                  # series
>> > > -                SeriesReference.objects.get_or_create(series=series,
>> > > -                                                      msgid=msgid)
>> > > +                try:
>> > > +                    SeriesReference.objects.get_or_create(series=series,
>> > > +                                                          msgid=msgid)
>> > > +                except SeriesReference.MultipleObjectsReturned:
>> > > +                    logger.error("Multiple SeriesReferences for %s"
>> > > +                                 " in project %s!" % (msgid, project.name))
>> > >  
>> > >              cover_letter = CoverLetter(
>> > >                  msgid=msgid,


More information about the Patchwork mailing list