[PATCH] parser: remove in-reply-to/references comments
emilyshaffer at google.com
Fri Jun 25 04:52:45 AEST 2021
On Wed, Jun 23, 2021 at 12:51:46PM -0400, Raxel Gutierrez wrote:
> When contributors using Gnus reply to a patch, the In-Reply-To &
> References header fields include comments in the form described in
> remove_rfc2822_comments spec. These comments can lead to threading
> issues where users' replies don't get associated to the patch they
> reply to because the added comment will be part of the message-id when
> parsed. Following RFC2822, a comment is not supposed to affect threading
> in any way, so we remove it. To use regex, we assume that comments are
> not nested; further changes to more thoroughly match the RFC2822 grammar
> remains for anyone interested.
Hm, does the nesting level of the comments really matter? Or is the
issue that they may be multiline?
That is - it's pretty trivial to write a regex to match
(foo(bar)baz((quot)meh)), as long as you don't actually care about the
semantics of the nesting.
> Signed-off-by: Raxel Gutierrez <raxel at google.com>
> Closes: #399
> patchwork/parser.py | 25 +++++++++++++++++--
> .../notes/issue-399-584c5be5b71dcf63.yaml | 7 ++++++
> 2 files changed, 30 insertions(+), 2 deletions(-)
> create mode 100644 releasenotes/notes/issue-399-584c5be5b71dcf63.yaml
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 61a8124..683ff55 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -70,6 +70,27 @@ def normalise_space(value):
> return whitespace_re.sub(' ', value).strip()
> +def remove_rfc2822_comments(header_contents):
> + """Removes RFC2822 comments from header fields.
> + Gnus create reply emails with commments like In-Reply-To/References:
> + <msg-id> (User's message of Sun, 01 Jan 2012 12:34:56 +0700) [comment].
> + Patchwork parses the values of the In-Reply-To & References header fields
> + with the comment included as part of their value. A side effect of the
> + comment not being removed is that message-ids are mismatched. These
> + comments do not provide useful information for processing patches
> + because they are ignored for threading and not rendered by mail readers.
> + """
> + # Captures comments in header fields.
Firstly, I'd like to point out for other reviewers that Raxel commented
the expression this way because I told him to - if you hate it, blame
me, not him ;)
> + comment_pattern = re.compile(r"""
> + \( # The opening parenthesis of comment
> + [^()]* # The contents of the comment
I *think* this is the bit that's making it not support nesting.
"Match anything besides another open- or close-paren".
https://docs.python.org/3/library/re.html tells me that Python treats
'*' as greedy by default, so wouldn't "\(.*\)" handle nested comments?
Or is there an issue that you can have more that one, e.g.
In-Reply-To: (danica's mail) abcd1-40-8d at mail.google.com (from gnus)
in which case greedy-matching would also obliterate the actual
This actually brings to mind that I'd like to see an example of one such
problematic line in the commit message, if you've got one handy.
> + \) # The closing parenthesis of comment
> + """, re.X)
> + return re.sub(comment_pattern, '', header_contents)
> def sanitise_header(header_contents, header_name=None):
> """Clean and individual mail header.
> @@ -483,13 +504,13 @@ def find_references(mail):
> if 'In-Reply-To' in mail:
> for in_reply_to in mail.get_all('In-Reply-To'):
> - r = clean_header(in_reply_to)
> + r = remove_rfc2822_comments(clean_header(in_reply_to))
> if r:
> if 'References' in mail:
> for references_header in mail.get_all('References'):
> - h = clean_header(references_header)
> + h = remove_rfc2822_comments(clean_header(references_header))
> if not h:
> references = h.split()
> diff --git a/releasenotes/notes/issue-399-584c5be5b71dcf63.yaml
> new file mode 100644
> index 0000000..a3e2ec1
> --- /dev/null
> +++ b/releasenotes/notes/issue-399-584c5be5b71dcf63.yaml
> @@ -0,0 +1,7 @@
> + - |
> + Gnus users' replies would not be associated to the patch they reply to
> + because their In-Reply-To & References field generated explanatory
> + comments that were added to the value of the message-id parsed.
> + (`#399 <https://github.com/getpatchwork/patchwork/issues/399>`__)
Hum. Is there a test suite we can add a regression test to for this
specific kind of line format?
Sorry not to have left any lines on the Python style - I'm the last
person you should ask. ;) Please don't consider my lack of comment to be
an approval, there.
More information about the Patchwork