[PATCH] parser: remove in-reply-to/references comments
dja at axtens.net
Fri Jun 25 13:40:58 AEST 2021
Thanks for your contribution Raxel!
>> When contributors using Gnus reply to a patch, the In-Reply-To &
>> References header fields include comments in the form described in
>> remove_rfc2822_comments spec. These comments can lead to threading
>> issues where users' replies don't get associated to the patch they
>> reply to because the added comment will be part of the message-id when
>> parsed. Following RFC2822, a comment is not supposed to affect threading
>> in any way, so we remove it. To use regex, we assume that comments are
>> not nested; further changes to more thoroughly match the RFC2822 grammar
>> remains for anyone interested.
> Hm, does the nesting level of the comments really matter? Or is the
> issue that they may be multiline?
> That is - it's pretty trivial to write a regex to match
> (foo(bar)baz((quot)meh)), as long as you don't actually care about the
> semantics of the nesting.
that comments may nest and, if I am reading the spec correctly, that
they may be multi-line: comment can contain folding white-space and
folding white-space can contain CRLF.
However, given that this is the first time comments have mattered at
all, I'd be OK to ignore multi-line comments. I would like nested
comments to work though, unless it makes a real mess of things.
>> Signed-off-by: Raxel Gutierrez <raxel at google.com>
>> Closes: #399
>> patchwork/parser.py | 25 +++++++++++++++++--
>> .../notes/issue-399-584c5be5b71dcf63.yaml | 7 ++++++
>> 2 files changed, 30 insertions(+), 2 deletions(-)
>> create mode 100644 releasenotes/notes/issue-399-584c5be5b71dcf63.yaml
>> diff --git a/patchwork/parser.py b/patchwork/parser.py
>> index 61a8124..683ff55 100644
>> --- a/patchwork/parser.py
>> +++ b/patchwork/parser.py
>> @@ -70,6 +70,27 @@ def normalise_space(value):
>> return whitespace_re.sub(' ', value).strip()
>> +def remove_rfc2822_comments(header_contents):
>> + """Removes RFC2822 comments from header fields.
>> + Gnus create reply emails with commments like In-Reply-To/References:
>> + <msg-id> (User's message of Sun, 01 Jan 2012 12:34:56 +0700) [comment].
>> + Patchwork parses the values of the In-Reply-To & References header fields
>> + with the comment included as part of their value. A side effect of the
>> + comment not being removed is that message-ids are mismatched. These
>> + comments do not provide useful information for processing patches
>> + because they are ignored for threading and not rendered by mail readers.
>> + """
>> + # Captures comments in header fields.
> Firstly, I'd like to point out for other reviewers that Raxel commented
> the expression this way because I told him to - if you hate it, blame
> me, not him ;)
If `tox -e flake8` is happy, I am happy :)
>> + comment_pattern = re.compile(r"""
>> + \( # The opening parenthesis of comment
>> + [^()]* # The contents of the comment
> I *think* this is the bit that's making it not support nesting.
> "Match anything besides another open- or close-paren".
> https://docs.python.org/3/library/re.html tells me that Python treats
> '*' as greedy by default, so wouldn't "\(.*\)" handle nested comments?
> Or is there an issue that you can have more that one, e.g.
> In-Reply-To: (danica's mail) abcd1-40-8d at mail.google.com (from gnus)
> in which case greedy-matching would also obliterate the actual
> This actually brings to mind that I'd like to see an example of one such
> problematic line in the commit message, if you've got one handy.
I've asked on the issue
(https://github.com/getpatchwork/patchwork/issues/399) to see if we can
get some examples. Ostensibly emacs generates them, but I use
emacs+notmuch and I don't see them so I think it might be gnus specific.
More information about the Patchwork