[PATCH] parser: remove in-reply-to/references comments
Emily Shaffer
emilyshaffer at google.com
Fri Jun 25 04:52:45 AEST 2021
On Wed, Jun 23, 2021 at 12:51:46PM -0400, Raxel Gutierrez wrote:
>
> When contributors using Gnus reply to a patch, the In-Reply-To &
> References header fields include comments in the form described in
> remove_rfc2822_comments spec. These comments can lead to threading
> issues where users' replies don't get associated to the patch they
> reply to because the added comment will be part of the message-id when
> parsed. Following RFC2822, a comment is not supposed to affect threading
> in any way, so we remove it. To use regex, we assume that comments are
> not nested; further changes to more thoroughly match the RFC2822 grammar
> remains for anyone interested.
Hm, does the nesting level of the comments really matter? Or is the
issue that they may be multiline?
That is - it's pretty trivial to write a regex to match
(foo(bar)baz((quot)meh)), as long as you don't actually care about the
semantics of the nesting.
>
> Signed-off-by: Raxel Gutierrez <raxel at google.com>
> Closes: #399
> ---
> patchwork/parser.py | 25 +++++++++++++++++--
> .../notes/issue-399-584c5be5b71dcf63.yaml | 7 ++++++
> 2 files changed, 30 insertions(+), 2 deletions(-)
> create mode 100644 releasenotes/notes/issue-399-584c5be5b71dcf63.yaml
>
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 61a8124..683ff55 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -70,6 +70,27 @@ def normalise_space(value):
> return whitespace_re.sub(' ', value).strip()
>
>
> +def remove_rfc2822_comments(header_contents):
> + """Removes RFC2822 comments from header fields.
> +
> + Gnus create reply emails with commments like In-Reply-To/References:
> + <msg-id> (User's message of Sun, 01 Jan 2012 12:34:56 +0700) [comment].
> + Patchwork parses the values of the In-Reply-To & References header fields
> + with the comment included as part of their value. A side effect of the
> + comment not being removed is that message-ids are mismatched. These
> + comments do not provide useful information for processing patches
> + because they are ignored for threading and not rendered by mail readers.
> + """
> +
> + # Captures comments in header fields.
Firstly, I'd like to point out for other reviewers that Raxel commented
the expression this way because I told him to - if you hate it, blame
me, not him ;)
> + comment_pattern = re.compile(r"""
> + \( # The opening parenthesis of comment
> + [^()]* # The contents of the comment
I *think* this is the bit that's making it not support nesting.
"Match anything besides another open- or close-paren".
https://docs.python.org/3/library/re.html tells me that Python treats
'*' as greedy by default, so wouldn't "\(.*\)" handle nested comments?
Or is there an issue that you can have more that one, e.g.
In-Reply-To: (danica's mail) abcd1-40-8d at mail.google.com (from gnus)
in which case greedy-matching would also obliterate the actual
message-id?
This actually brings to mind that I'd like to see an example of one such
problematic line in the commit message, if you've got one handy.
> + \) # The closing parenthesis of comment
> + """, re.X)
> + return re.sub(comment_pattern, '', header_contents)
> +
> +
> def sanitise_header(header_contents, header_name=None):
> """Clean and individual mail header.
>
> @@ -483,13 +504,13 @@ def find_references(mail):
>
> if 'In-Reply-To' in mail:
> for in_reply_to in mail.get_all('In-Reply-To'):
> - r = clean_header(in_reply_to)
> + r = remove_rfc2822_comments(clean_header(in_reply_to))
> if r:
> refs.append(r)
>
> if 'References' in mail:
> for references_header in mail.get_all('References'):
> - h = clean_header(references_header)
> + h = remove_rfc2822_comments(clean_header(references_header))
> if not h:
> continue
> references = h.split()
> diff --git a/releasenotes/notes/issue-399-584c5be5b71dcf63.yaml
> b/releasenotes/notes/issue-399-584c5be5b71dcf63.yaml
> new file mode 100644
> index 0000000..a3e2ec1
> --- /dev/null
> +++ b/releasenotes/notes/issue-399-584c5be5b71dcf63.yaml
> @@ -0,0 +1,7 @@
> +---
> +fixes:
> + - |
> + Gnus users' replies would not be associated to the patch they reply to
> + because their In-Reply-To & References field generated explanatory
> + comments that were added to the value of the message-id parsed.
> + (`#399 <https://github.com/getpatchwork/patchwork/issues/399>`__)
Hum. Is there a test suite we can add a regression test to for this
specific kind of line format?
Sorry not to have left any lines on the Python style - I'm the last
person you should ask. ;) Please don't consider my lack of comment to be
an approval, there.
- Emily
More information about the Patchwork
mailing list