[PATCH] parser: remove in-reply-to/references comments

Daniel Axtens dja at axtens.net
Fri Jun 25 13:40:58 AEST 2021


Hi,

Thanks for your contribution Raxel!

>> When contributors using Gnus reply to a patch, the In-Reply-To &
>> References header fields include comments in the form described in
>> remove_rfc2822_comments spec. These comments can lead to threading
>> issues where users' replies don't get associated to the patch they
>> reply to because the added comment will be part of the message-id when
>> parsed. Following RFC2822, a comment is not supposed to affect threading
>> in any way, so we remove it. To use regex, we assume that comments are
>> not nested; further changes to more thoroughly match the RFC2822 grammar
>> remains for anyone interested.
>
> Hm, does the nesting level of the comments really matter? Or is the
> issue that they may be multiline?
>
> That is - it's pretty trivial to write a regex to match
> (foo(bar)baz((quot)meh)), as long as you don't actually care about the
> semantics of the nesting.

https://datatracker.ietf.org/doc/html/rfc2822#section-3.2.3 suggests
that comments may nest and, if I am reading the spec correctly, that
they may be multi-line: comment can contain folding white-space and
folding white-space can contain CRLF.

However, given that this is the first time comments have mattered at
all, I'd be OK to ignore multi-line comments. I would like nested
comments to work though, unless it makes a real mess of things.

>> 
>> Signed-off-by: Raxel Gutierrez <raxel at google.com>
>> Closes: #399
>> ---
>>  patchwork/parser.py                           | 25 +++++++++++++++++--
>>  .../notes/issue-399-584c5be5b71dcf63.yaml     |  7 ++++++
>>  2 files changed, 30 insertions(+), 2 deletions(-)
>>  create mode 100644 releasenotes/notes/issue-399-584c5be5b71dcf63.yaml
>> 
>> diff --git a/patchwork/parser.py b/patchwork/parser.py
>> index 61a8124..683ff55 100644
>> --- a/patchwork/parser.py
>> +++ b/patchwork/parser.py
>> @@ -70,6 +70,27 @@ def normalise_space(value):
>>      return whitespace_re.sub(' ', value).strip()
>> 
>> 
>> +def remove_rfc2822_comments(header_contents):
>> +    """Removes RFC2822 comments from header fields.
>> +
>> +    Gnus create reply emails with commments like In-Reply-To/References:
>> +    <msg-id> (User's message of Sun, 01 Jan 2012 12:34:56 +0700) [comment].
>> +    Patchwork parses the values of the In-Reply-To & References header fields
>> +    with the comment included as part of their value. A side effect of the
>> +    comment not being removed is that message-ids are mismatched. These
>> +    comments do not provide useful information for processing patches
>> +    because they are ignored for threading and not rendered by mail readers.
>> +    """
>> +
>> +    # Captures comments in header fields.
>
> Firstly, I'd like to point out for other reviewers that Raxel commented
> the expression this way because I told him to - if you hate it, blame
> me, not him ;)

If `tox -e flake8` is happy, I am happy :)

>> +    comment_pattern = re.compile(r"""
>> +                                \(      # The opening parenthesis of comment
>> +                                [^()]*  # The contents of the comment
> I *think* this is the bit that's making it not support nesting.
> "Match anything besides another open- or close-paren".
>
> https://docs.python.org/3/library/re.html tells me that Python treats
> '*' as greedy by default, so wouldn't "\(.*\)" handle nested comments?
> Or is there an issue that you can have more that one, e.g.
>
>   In-Reply-To: (danica's mail) abcd1-40-8d at mail.google.com (from gnus)
>
> in which case greedy-matching would also obliterate the actual
> message-id?
>
> This actually brings to mind that I'd like to see an example of one such
> problematic line in the commit message, if you've got one handy.

I've asked on the issue
(https://github.com/getpatchwork/patchwork/issues/399) to see if we can
get some examples. Ostensibly emacs generates them, but I use
emacs+notmuch and I don't see them so I think it might be gnus specific.

Kind Regards,
Daniel


More information about the Patchwork mailing list