[PATCH] parser: remove in-reply-to/references comments
Daniel Axtens
dja at axtens.net
Fri Jun 25 19:09:42 AEST 2021
Daniel Axtens <dja at axtens.net> writes:
> However, given that this is the first time comments have mattered at
> all, I'd be OK to ignore multi-line comments. I would like nested
> comments to work though, unless it makes a real mess of things.
Ah so it turns out I'm totally wrong here: we've been given some sample
data on the GH issue (http://www.delorie.com/tmp/patchwork-399-1.txt)
and it contains a multi-line comment:
In-Reply-To: <4574b99b-edac-d8dc-9141-79c3109d2fcc at huawei.com> (message from
liqingqing on Thu, 1 Apr 2021 16:51:45 +0800)
I don't know if Python's email module will fold multi-line headers
automatically - it's very possible it does - or if the regex will work
over multiple lines... I lose track of which regex engine does what!
Kind regards,
Daniel
>
>>>
>>> Signed-off-by: Raxel Gutierrez <raxel at google.com>
>>> Closes: #399
>>> ---
>>> patchwork/parser.py | 25 +++++++++++++++++--
>>> .../notes/issue-399-584c5be5b71dcf63.yaml | 7 ++++++
>>> 2 files changed, 30 insertions(+), 2 deletions(-)
>>> create mode 100644 releasenotes/notes/issue-399-584c5be5b71dcf63.yaml
>>>
>>> diff --git a/patchwork/parser.py b/patchwork/parser.py
>>> index 61a8124..683ff55 100644
>>> --- a/patchwork/parser.py
>>> +++ b/patchwork/parser.py
>>> @@ -70,6 +70,27 @@ def normalise_space(value):
>>> return whitespace_re.sub(' ', value).strip()
>>>
>>>
>>> +def remove_rfc2822_comments(header_contents):
>>> + """Removes RFC2822 comments from header fields.
>>> +
>>> + Gnus create reply emails with commments like In-Reply-To/References:
>>> + <msg-id> (User's message of Sun, 01 Jan 2012 12:34:56 +0700) [comment].
>>> + Patchwork parses the values of the In-Reply-To & References header fields
>>> + with the comment included as part of their value. A side effect of the
>>> + comment not being removed is that message-ids are mismatched. These
>>> + comments do not provide useful information for processing patches
>>> + because they are ignored for threading and not rendered by mail readers.
>>> + """
>>> +
>>> + # Captures comments in header fields.
>>
>> Firstly, I'd like to point out for other reviewers that Raxel commented
>> the expression this way because I told him to - if you hate it, blame
>> me, not him ;)
>
> If `tox -e flake8` is happy, I am happy :)
>
>>> + comment_pattern = re.compile(r"""
>>> + \( # The opening parenthesis of comment
>>> + [^()]* # The contents of the comment
>> I *think* this is the bit that's making it not support nesting.
>> "Match anything besides another open- or close-paren".
>>
>> https://docs.python.org/3/library/re.html tells me that Python treats
>> '*' as greedy by default, so wouldn't "\(.*\)" handle nested comments?
>> Or is there an issue that you can have more that one, e.g.
>>
>> In-Reply-To: (danica's mail) abcd1-40-8d at mail.google.com (from gnus)
>>
>> in which case greedy-matching would also obliterate the actual
>> message-id?
>>
>> This actually brings to mind that I'd like to see an example of one such
>> problematic line in the commit message, if you've got one handy.
>
> I've asked on the issue
> (https://github.com/getpatchwork/patchwork/issues/399) to see if we can
> get some examples. Ostensibly emacs generates them, but I use
> emacs+notmuch and I don't see them so I think it might be gnus specific.
>
> Kind Regards,
> Daniel
More information about the Patchwork
mailing list