[PATCH v6 3/8] parser: Add series parsing

Stephen Finucane stephen at that.guru
Thu Oct 27 02:51:52 AEDT 2016


On 2016-10-26 09:24, Daniel Axtens wrote:
> Hi Stephen,
> 
>> +    for ref in find_references(mail, True):
>> +        # try parsing by RFC5322 fields first
>> +        try:
>> +            return SeriesReference.objects.get(msgid=ref).series
>> +        except SeriesReference.DoesNotExist:
>> +            pass
> 
> Hopefully I've understood what's going on here:
>  - first, check for a Series that directly matches this message's
>    message id (hence the True to find_references)
>  - then, check for a series that matches an In-Reply-To
>  - then, check for a series that matches the References, from most
>    recent to least recent (hopefully!)
> 
> 
> Is that right?

Sure is. We prefer locality to the current email where possible. I've 
added comments to clarify this.

> I think I would prefer pulling out the check for the Message ID into a
> different test, rather than the extra flag to find_references - I'm not
> sure a message ID is a reference in the sense in the usual sense. But
> I'm not particularly fussy - if you've considered this approach and
> rejected it for some reason that's fine.

Nope - it was this way in previous revisions and this seemed cleaner. 
Reverted to the older approach.

>> +def _parse_prefixes(subject_prefixes, regex):
>> +    for prefix in subject_prefixes:
>> +        m = regex.match(prefix)
>> +        if m:
>> +            return m
>> +
>> +
> _first_prefix_match or _first_matching_prefix perhaps?

Done.

>> +    regex = re.compile('^[vV](\d+)$')
>> +    m = _parse_prefixes(subject_prefixes, regex)
>> +    if m:
>> +        return int(m.group(1))
>> +
>> +    m = re.search(r'\([vV](\d+)\)', subject)
> Should this regex be precompiled at the head of the file? I also notice
> you've attempted to compile it at the head of the function, you're just
> not using the compiled version here...

Is seems Python caches this for us anyway [1], so I see no real reason. 
We compile in the first case simply for readability's sake, however, the 
regexes are not the same so we can't reuse it for the second case (note 
the additional brackets).

[1] http://stackoverflow.com/a/452143/613428

>> +        series = find_series(mail)
>> +        if not series and n:  # the series markers indicates a series
>> +            series = SeriesRevision(date=date,
>> +                                    submitter=author,
>> +                                    version=version,
>> +                                    total=n)
>> +            series.save()
>> +
>> +            for ref in refs + [msgid]:  # save references for series
>> +                # we don't want duplicates
>> +                SeriesReference.objects.get_or_create(series=series,
>> msgid=ref)
> 
> I must admit I don't quite understand the benefit of creating all the
> series references straight away. What is the advantage of creating them
> here as opposed to when the subsequent patches are received? (I could 
> be
> completely misunderstanding some key part of the parsing here -
> apologies if so!)

The key reason is to handle patches received out-of-order. Without 
storing references, we have no way to identify a relationship between 
the two patches as the RFC822 headers (References, Message-Id, 
In-Reply-To) only refer to older, prior mails. This fixes that issue.

FWIW, another idea was to store the root message ID against the series 
(I think the freedesktop fork does this), but doing so would require 
some magic for deeply threaded series (where each patch in sent in reply 
to the previous patch). In addition, I don't think any amount of magic 
would work for series sent in reply to a previous series.

>> +            try:
>> +                series = 
>> SeriesReference.objects.get(msgid=msgid).series
>> +            except SeriesReference.DoesNotExist:
>> +                series = None
> django shortcuts has a get_object_or_None - I'll leave it to you to
> weigh whether or not the brevity is worth the additional dependency.

I don't think it is. TBH, if we could get rid of the non-Django 
dependencies we do have, I'm sure we'd make some kernel.org sysadmins 
very happy. Alas, django-rest-framework is far too good to ignore.

> The tests and the rest of the code look good!

Thanks for the reviews :)

Stephen


More information about the Patchwork mailing list