[PATCH v6 3/8] parser: Add series parsing

Daniel Axtens dja at axtens.net
Thu Oct 27 10:19:01 AEDT 2016

Hi Stephen,

>> Hopefully I've understood what's going on here:
>>  - first, check for a Series that directly matches this message's
>>    message id (hence the True to find_references)
>>  - then, check for a series that matches an In-Reply-To
>>  - then, check for a series that matches the References, from most
>>    recent to least recent (hopefully!)
>> Is that right?
> Sure is. We prefer locality to the current email where possible. I've 
> added comments to clarify this.


>> I think I would prefer pulling out the check for the Message ID into a
>> different test, rather than the extra flag to find_references - I'm not
>> sure a message ID is a reference in the sense in the usual sense. But
>> I'm not particularly fussy - if you've considered this approach and
>> rejected it for some reason that's fine.
> Nope - it was this way in previous revisions and this seemed cleaner. 
> Reverted to the older approach.


>>> +def _parse_prefixes(subject_prefixes, regex):
>>> +    for prefix in subject_prefixes:
>>> +        m = regex.match(prefix)
>>> +        if m:
>>> +            return m
>>> +
>>> +
>> _first_prefix_match or _first_matching_prefix perhaps?
> Done.


>>> +    regex = re.compile('^[vV](\d+)$')
>>> +    m = _parse_prefixes(subject_prefixes, regex)
>>> +    if m:
>>> +        return int(m.group(1))
>>> +
>>> +    m = re.search(r'\([vV](\d+)\)', subject)
>> Should this regex be precompiled at the head of the file? I also notice
>> you've attempted to compile it at the head of the function, you're just
>> not using the compiled version here...
> Is seems Python caches this for us anyway [1], so I see no real reason. 
> We compile in the first case simply for readability's sake, however, the 
> regexes are not the same so we can't reuse it for the second case (note 
> the additional brackets).
> [1] http://stackoverflow.com/a/452143/613428

Ah - you're right, they are different - sorry I missed that. It's fine
as is then, thanks for clarifying.

>> I must admit I don't quite understand the benefit of creating all the
>> series references straight away. What is the advantage of creating them
>> here as opposed to when the subsequent patches are received? (I could 
>> be
>> completely misunderstanding some key part of the parsing here -
>> apologies if so!)
> The key reason is to handle patches received out-of-order. Without 
> storing references, we have no way to identify a relationship between 
> the two patches as the RFC822 headers (References, Message-Id, 
> In-Reply-To) only refer to older, prior mails. This fixes that issue.
> FWIW, another idea was to store the root message ID against the series 
> (I think the freedesktop fork does this), but doing so would require 
> some magic for deeply threaded series (where each patch in sent in reply 
> to the previous patch). In addition, I don't think any amount of magic 
> would work for series sent in reply to a previous series.
Ah, OK. I'm convinced then :)

>>> +            try:
>>> +                series = 
>>> SeriesReference.objects.get(msgid=msgid).series
>>> +            except SeriesReference.DoesNotExist:
>>> +                series = None
>> django shortcuts has a get_object_or_None - I'll leave it to you to
>> weigh whether or not the brevity is worth the additional dependency.
> I don't think it is. TBH, if we could get rid of the non-Django 
> dependencies we do have, I'm sure we'd make some kernel.org sysadmins 
> very happy. Alas, django-rest-framework is far too good to ignore.

That's a completely fair call.

For your next revision:
Reviewed-by: Daniel Axtens <dja at axtens.net>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 859 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/patchwork/attachments/20161027/c3d9f004/attachment-0001.sig>

More information about the Patchwork mailing list