[PATCH] parser: remove in-reply-to/references comments

Daniel Axtens dja at axtens.net
Fri Jun 25 19:09:42 AEST 2021


Daniel Axtens <dja at axtens.net> writes:
> However, given that this is the first time comments have mattered at
> all, I'd be OK to ignore multi-line comments. I would like nested
> comments to work though, unless it makes a real mess of things.

Ah so it turns out I'm totally wrong here: we've been given some sample
data on the GH issue (http://www.delorie.com/tmp/patchwork-399-1.txt)
and it contains a multi-line comment:

In-Reply-To: <4574b99b-edac-d8dc-9141-79c3109d2fcc at huawei.com> (message from
 liqingqing on Thu, 1 Apr 2021 16:51:45 +0800)

I don't know if Python's email module will fold multi-line headers
automatically - it's very possible it does - or if the regex will work
over multiple lines... I lose track of which regex engine does what!

Kind regards,
Daniel

>
>>> 
>>> Signed-off-by: Raxel Gutierrez <raxel at google.com>
>>> Closes: #399
>>> ---
>>>  patchwork/parser.py                           | 25 +++++++++++++++++--
>>>  .../notes/issue-399-584c5be5b71dcf63.yaml     |  7 ++++++
>>>  2 files changed, 30 insertions(+), 2 deletions(-)
>>>  create mode 100644 releasenotes/notes/issue-399-584c5be5b71dcf63.yaml
>>> 
>>> diff --git a/patchwork/parser.py b/patchwork/parser.py
>>> index 61a8124..683ff55 100644
>>> --- a/patchwork/parser.py
>>> +++ b/patchwork/parser.py
>>> @@ -70,6 +70,27 @@ def normalise_space(value):
>>>      return whitespace_re.sub(' ', value).strip()
>>> 
>>> 
>>> +def remove_rfc2822_comments(header_contents):
>>> +    """Removes RFC2822 comments from header fields.
>>> +
>>> +    Gnus create reply emails with commments like In-Reply-To/References:
>>> +    <msg-id> (User's message of Sun, 01 Jan 2012 12:34:56 +0700) [comment].
>>> +    Patchwork parses the values of the In-Reply-To & References header fields
>>> +    with the comment included as part of their value. A side effect of the
>>> +    comment not being removed is that message-ids are mismatched. These
>>> +    comments do not provide useful information for processing patches
>>> +    because they are ignored for threading and not rendered by mail readers.
>>> +    """
>>> +
>>> +    # Captures comments in header fields.
>>
>> Firstly, I'd like to point out for other reviewers that Raxel commented
>> the expression this way because I told him to - if you hate it, blame
>> me, not him ;)
>
> If `tox -e flake8` is happy, I am happy :)
>
>>> +    comment_pattern = re.compile(r"""
>>> +                                \(      # The opening parenthesis of comment
>>> +                                [^()]*  # The contents of the comment
>> I *think* this is the bit that's making it not support nesting.
>> "Match anything besides another open- or close-paren".
>>
>> https://docs.python.org/3/library/re.html tells me that Python treats
>> '*' as greedy by default, so wouldn't "\(.*\)" handle nested comments?
>> Or is there an issue that you can have more that one, e.g.
>>
>>   In-Reply-To: (danica's mail) abcd1-40-8d at mail.google.com (from gnus)
>>
>> in which case greedy-matching would also obliterate the actual
>> message-id?
>>
>> This actually brings to mind that I'd like to see an example of one such
>> problematic line in the commit message, if you've got one handy.
>
> I've asked on the issue
> (https://github.com/getpatchwork/patchwork/issues/399) to see if we can
> get some examples. Ostensibly emacs generates them, but I use
> emacs+notmuch and I don't see them so I think it might be gnus specific.
>
> Kind Regards,
> Daniel


More information about the Patchwork mailing list