[PATCH] parser: remove in-reply-to/references comments

Raxel Gutierrez raxel at google.com
Thu Jun 24 02:51:46 AEST 2021


When contributors using Gnus reply to a patch, the In-Reply-To &
References header fields include comments in the form described in
remove_rfc2822_comments spec. These comments can lead to threading
issues where users' replies don't get associated to the patch they
reply to because the added comment will be part of the message-id when
parsed. Following RFC2822, a comment is not supposed to affect threading
in any way, so we remove it. To use regex, we assume that comments are
not nested; further changes to more thoroughly match the RFC2822 grammar
remains for anyone interested.

Signed-off-by: Raxel Gutierrez <raxel at google.com>
Closes: #399
---
 patchwork/parser.py                           | 25 +++++++++++++++++--
 .../notes/issue-399-584c5be5b71dcf63.yaml     |  7 ++++++
 2 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 releasenotes/notes/issue-399-584c5be5b71dcf63.yaml

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 61a8124..683ff55 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -70,6 +70,27 @@ def normalise_space(value):
     return whitespace_re.sub(' ', value).strip()


+def remove_rfc2822_comments(header_contents):
+    """Removes RFC2822 comments from header fields.
+
+    Gnus create reply emails with commments like In-Reply-To/References:
+    <msg-id> (User's message of Sun, 01 Jan 2012 12:34:56 +0700) [comment].
+    Patchwork parses the values of the In-Reply-To & References header fields
+    with the comment included as part of their value. A side effect of the
+    comment not being removed is that message-ids are mismatched. These
+    comments do not provide useful information for processing patches
+    because they are ignored for threading and not rendered by mail readers.
+    """
+
+    # Captures comments in header fields.
+    comment_pattern = re.compile(r"""
+                                \(      # The opening parenthesis of comment
+                                [^()]*  # The contents of the comment
+                                \)      # The closing parenthesis of comment
+                                """, re.X)
+    return re.sub(comment_pattern, '', header_contents)
+
+
 def sanitise_header(header_contents, header_name=None):
     """Clean and individual mail header.

@@ -483,13 +504,13 @@ def find_references(mail):

     if 'In-Reply-To' in mail:
         for in_reply_to in mail.get_all('In-Reply-To'):
-            r = clean_header(in_reply_to)
+            r = remove_rfc2822_comments(clean_header(in_reply_to))
             if r:
                 refs.append(r)

     if 'References' in mail:
         for references_header in mail.get_all('References'):
-            h = clean_header(references_header)
+            h = remove_rfc2822_comments(clean_header(references_header))
             if not h:
                 continue
             references = h.split()
diff --git a/releasenotes/notes/issue-399-584c5be5b71dcf63.yaml
b/releasenotes/notes/issue-399-584c5be5b71dcf63.yaml
new file mode 100644
index 0000000..a3e2ec1
--- /dev/null
+++ b/releasenotes/notes/issue-399-584c5be5b71dcf63.yaml
@@ -0,0 +1,7 @@
+---
+fixes:
+  - |
+    Gnus users' replies would not be associated to the patch they reply to
+    because their In-Reply-To & References field generated explanatory
+    comments that were added to the value of the message-id parsed.
+    (`#399 <https://github.com/getpatchwork/patchwork/issues/399>`__)
-- 
2.32.0.288.g62a8d224e6-goog


More information about the Patchwork mailing list