[PATCH v6 3/8] parser: Add series parsing

Daniel Axtens dja at axtens.net
Wed Oct 26 20:24:07 AEDT 2016


Hi Stephen,

> +    for ref in find_references(mail, True):
> +        # try parsing by RFC5322 fields first
> +        try:
> +            return SeriesReference.objects.get(msgid=ref).series
> +        except SeriesReference.DoesNotExist:
> +            pass

Hopefully I've understood what's going on here:
 - first, check for a Series that directly matches this message's
   message id (hence the True to find_references)
 - then, check for a series that matches an In-Reply-To
 - then, check for a series that matches the References, from most
   recent to least recent (hopefully!)

Is that right?

I think I would prefer pulling out the check for the Message ID into a
different test, rather than the extra flag to find_references - I'm not
sure a message ID is a reference in the sense in the usual sense. But
I'm not particularly fussy - if you've considered this approach and
rejected it for some reason that's fine.

> +def _parse_prefixes(subject_prefixes, regex):
> +    for prefix in subject_prefixes:
> +        m = regex.match(prefix)
> +        if m:
> +            return m
> +
> +
_first_prefix_match or _first_matching_prefix perhaps?

> +    regex = re.compile('^[vV](\d+)$')
> +    m = _parse_prefixes(subject_prefixes, regex)
> +    if m:
> +        return int(m.group(1))
> +
> +    m = re.search(r'\([vV](\d+)\)', subject)
Should this regex be precompiled at the head of the file? I also notice
you've attempted to compile it at the head of the function, you're just
not using the compiled version here...

> +        series = find_series(mail)
> +        if not series and n:  # the series markers indicates a series
> +            series = SeriesRevision(date=date,
> +                                    submitter=author,
> +                                    version=version,
> +                                    total=n)
> +            series.save()
> +
> +            for ref in refs + [msgid]:  # save references for series
> +                # we don't want duplicates
> +                SeriesReference.objects.get_or_create(series=series,
> msgid=ref)

I must admit I don't quite understand the benefit of creating all the
series references straight away. What is the advantage of creating them
here as opposed to when the subsequent patches are received? (I could be
completely misunderstanding some key part of the parsing here -
apologies if so!)

> +            try:
> +                series = SeriesReference.objects.get(msgid=msgid).series
> +            except SeriesReference.DoesNotExist:
> +                series = None
django shortcuts has a get_object_or_None - I'll leave it to you to
weigh whether or not the brevity is worth the additional dependency.

The tests and the rest of the code look good!

Regards,
Daniel


More information about the Patchwork mailing list