[PATCH v2] parser: Unmangle From: headers that have been mangled for DMARC purposes

Daniel Axtens dja at axtens.net
Fri Oct 18 16:50:07 AEDT 2019


Applied with a release note.

Daniel Axtens <dja at axtens.net> writes:

> Andrew Donnellan <ajd at linux.ibm.com> writes:
>
>> To avoid triggering spam filters due to failed signature validation, many
>> mailing lists mangle the From header to change the From address to be the
>> address of the list, typically where the sender's domain has a strict DMARC
>> policy enabled.
>>
>> In this case, we should try to unmangle the From header.
>>
>> Add support for using the X-Original-From or Reply-To headers, as used by
>> Google Groups and Mailman respectively, to unmangle the From header when
>> necessary and associate the patch with the correct submitter based on the
>> unmangled email address.
>>
>> When downloading mboxes, rewrite the From header using the unmangled
>> address, and preserve the original header as X-Patchwork-Original-From in
>> case someone needs it for some reason. The original From header will still
>> be stored in the database and exposed via the API, as we want to keep
>> messages as close to the original received format as possible.
>>
>> Closes: #64 ("Incorrect submitter when using googlegroups")
>> Reported-by: Alexandre Belloni <alexandre.belloni at bootlin.com>
>> Reported-by: Stephen Rothwell <sfr at canb.auug.org.au>
>> Signed-off-by: Andrew Donnellan <ajd at linux.ibm.com>
>
> Tested-by: Daniel Axtens <dja at axtens.net> # mailman only
>
> I'll apply it shortly to master. You could convince me to get it
> backported to stable if you really wanted.
>
> Do we need something to go through the db and fix up old ones? Or is it
> best to let sleeping patches lie?
>
> Regards,
> Daniel
>>
>> ---
>>
>> v1->v2:
>> - use X-Original-From rather than X-Original-Sender
>> - unmangle From header when downloading mbox
>>
>> rewrite from header
>>
>> Signed-off-by: Andrew Donnellan <ajd at linux.ibm.com>
>>
>> use x original from
>>
>> Signed-off-by: Andrew Donnellan <ajd at linux.ibm.com>
>> ---
>>  patchwork/parser.py               | 75 +++++++++++++++++++++++++++----
>>  patchwork/tests/test_mboxviews.py | 21 +++++++++
>>  patchwork/tests/test_parser.py    | 68 ++++++++++++++++++++++++++--
>>  patchwork/views/utils.py          | 12 +++++
>>  4 files changed, 163 insertions(+), 13 deletions(-)
>>
>> diff --git a/patchwork/parser.py b/patchwork/parser.py
>> index 7dc66bc05a5b..be1e51652dd3 100644
>> --- a/patchwork/parser.py
>> +++ b/patchwork/parser.py
>> @@ -321,12 +321,7 @@ def find_series(project, mail, author):
>>      return _find_series_by_markers(project, mail, author)
>>  
>>  
>> -def get_or_create_author(mail):
>> -    from_header = clean_header(mail.get('From'))
>> -
>> -    if not from_header:
>> -        raise ValueError("Invalid 'From' header")
>> -
>> +def split_from_header(from_header):
>>      name, email = (None, None)
>>  
>>      # tuple of (regex, fn)
>> @@ -355,6 +350,65 @@ def get_or_create_author(mail):
>>              (name, email) = fn(match.groups())
>>              break
>>  
>> +    return (name, email)
>> +
>> +
>> +# Unmangle From addresses that have been mangled for DMARC purposes.
>> +#
>> +# To avoid triggering spam filters due to failed signature validation, many
>> +# mailing lists mangle the From header to change the From address to be the
>> +# address of the list, typically where the sender's domain has a strict
>> +# DMARC policy enabled.
>> +#
>> +# Unfortunately, there's no standardised way of preserving the original
>> +# From address.
>> +#
>> +# Google Groups adds an X-Original-From header. If present, we use that.
>> +#
>> +# Mailman preserves the original address by adding a Reply-To, except in the
>> +# case where the list is set to either reply to list, or reply to a specific
>> +# address, in which case the original From is added to Cc instead. These corner
>> +# cases are dumb, but we try and handle things as sensibly as possible by
>> +# looking for a name in Reply-To/Cc that matches From. It's inexact but should
>> +# be good enough for our purposes.
>> +def get_original_sender(mail, name, email):
>> +    if name and ' via ' in name:
>> +        # Mailman uses the format "<name> via <list>"
>> +        # Google Groups uses "'<name>' via <list>"
>> +        stripped_name = name[:name.rfind(' via ')].strip().strip("'")
>> +
>> +    original_from = clean_header(mail.get('X-Original-From', ''))
>> +    if original_from:
>> +        new_email = split_from_header(original_from)[1].strip()[:255]
>> +        return (stripped_name, new_email)
>> +
>> +    addrs = []
>> +    reply_to_headers = mail.get_all('Reply-To') or []
>> +    cc_headers = mail.get_all('Cc') or []
>> +    for header in reply_to_headers + cc_headers:
>> +        header = clean_header(header)
>> +        addrs = header.split(",")
>> +        for addr in addrs:
>> +            new_name, new_email = split_from_header(addr)
>> +            if new_name:
>> +                new_name = new_name.strip()[:255]
>> +            if new_email:
>> +                new_email = new_email.strip()[:255]
>> +            if new_name == stripped_name:
>> +                return (stripped_name, new_email)
>> +
>> +    # If we can't figure out the original sender, just keep it as is
>> +    return (name, email)
>> +
>> +
>> +def get_or_create_author(mail, project=None):
>> +    from_header = clean_header(mail.get('From'))
>> +
>> +    if not from_header:
>> +        raise ValueError("Invalid 'From' header")
>> +
>> +    name, email = split_from_header(from_header)
>> +
>>      if not email:
>>          raise ValueError("Invalid 'From' header")
>>  
>> @@ -362,6 +416,9 @@ def get_or_create_author(mail):
>>      if name is not None:
>>          name = name.strip()[:255]
>>  
>> +    if project and email.lower() == project.listemail.lower():
>> +        name, email = get_original_sender(mail, name, email)
>> +
>>      # this correctly handles the case where we lose the race to create
>>      # the person and another process beats us to it. (If the record
>>      # does not exist, g_o_c invokes _create_object_from_params which
>> @@ -1004,7 +1061,7 @@ def parse_mail(mail, list_id=None):
>>  
>>      if not is_comment and (diff or pull_url):  # patches or pull requests
>>          # we delay the saving until we know we have a patch.
>> -        author = get_or_create_author(mail)
>> +        author = get_or_create_author(mail, project)
>>  
>>          delegate = find_delegate_by_header(mail)
>>          if not delegate and diff:
>> @@ -1099,7 +1156,7 @@ def parse_mail(mail, list_id=None):
>>                  is_cover_letter = True
>>  
>>          if is_cover_letter:
>> -            author = get_or_create_author(mail)
>> +            author = get_or_create_author(mail, project)
>>  
>>              # we don't use 'find_series' here as a cover letter will
>>              # always be the first item in a thread, thus the references
>> @@ -1159,7 +1216,7 @@ def parse_mail(mail, list_id=None):
>>      if not submission:
>>          return
>>  
>> -    author = get_or_create_author(mail)
>> +    author = get_or_create_author(mail, project)
>>  
>>      try:
>>          comment = Comment.objects.create(
>> diff --git a/patchwork/tests/test_mboxviews.py b/patchwork/tests/test_mboxviews.py
>> index 3854a856c4db..8f67778dbacb 100644
>> --- a/patchwork/tests/test_mboxviews.py
>> +++ b/patchwork/tests/test_mboxviews.py
>> @@ -183,6 +183,27 @@ class MboxHeaderTest(TestCase):
>>                                          patch.url_msgid]))
>>          self.assertContains(response, from_email)
>>  
>> +    def test_dmarc_from_header(self):
>> +        """Validate 'From' header is rewritten correctly when DMARC-munged.
>> +
>> +        Test that when an email with a DMARC-munged From header is processed,
>> +        the From header will be unmunged and the munged address will be saved
>> +        as 'X-Patchwork-Original-From'.
>> +        """
>> +        orig_from_header = 'Person via List <list at example.com>'
>> +        rewritten_from_header = 'Person <person at example.com>'
>> +        project = create_project(listemail='list at example.com')
>> +        person = create_person(name='Person', email='person at example.com')
>> +        patch = create_patch(project=project,
>> +                             headers='From: ' + orig_from_header,
>> +                             submitter=person)
>> +        response = self.client.get(
>> +            reverse('patch-mbox', args=[patch.project.linkname,
>> +                                        patch.url_msgid]))
>> +        mail = email.message_from_string(response.content.decode())
>> +        self.assertEqual(mail['From'], rewritten_from_header)
>> +        self.assertEqual(mail['X-Patchwork-Original-From'], orig_from_header)
>> +
>>      def test_date_header(self):
>>          patch = create_patch()
>>          response = self.client.get(
>> diff --git a/patchwork/tests/test_parser.py b/patchwork/tests/test_parser.py
>> index e5a2fca3044e..85c6c52f0e4b 100644
>> --- a/patchwork/tests/test_parser.py
>> +++ b/patchwork/tests/test_parser.py
>> @@ -265,11 +265,23 @@ class SenderCorrelationTest(TestCase):
>>      """
>>  
>>      @staticmethod
>> -    def _create_email(from_header):
>> +    def _create_email(from_header, reply_tos=None, ccs=None,
>> +                      x_original_from=None):
>>          mail = 'Message-Id: %s\n' % make_msgid() + \
>> -               'From: %s\n' % from_header + \
>> -               'Subject: Tests\n\n'\
>> -               'test\n'
>> +               'From: %s\n' % from_header
>> +
>> +        if reply_tos:
>> +            mail += 'Reply-To: %s\n' % ', '.join(reply_tos)
>> +
>> +        if ccs:
>> +            mail += 'Cc: %s\n' % ', '.join(ccs)
>> +
>> +        if x_original_from:
>> +            mail += 'X-Original-From: %s\n' % x_original_from
>> +
>> +        mail += 'Subject: Tests\n\n'\
>> +            'test\n'
>> +
>>          return message_from_string(mail)
>>  
>>      def test_existing_sender(self):
>> @@ -311,6 +323,54 @@ class SenderCorrelationTest(TestCase):
>>          self.assertEqual(person_b._state.adding, False)
>>          self.assertEqual(person_b.id, person_a.id)
>>  
>> +    def test_mailman_dmarc_munging(self):
>> +        project = create_project()
>> +        real_sender = 'Existing Sender <existing at example.com>'
>> +        munged_sender = 'Existing Sender via List <{}>'.format(
>> +            project.listemail)
>> +        other_email = 'Other Person <other at example.com>'
>> +
>> +        # Unmunged author
>> +        mail = self._create_email(real_sender)
>> +        person_a = get_or_create_author(mail, project)
>> +        person_a.save()
>> +
>> +        # Single Reply-To
>> +        mail = self._create_email(munged_sender, [real_sender])
>> +        person_b = get_or_create_author(mail, project)
>> +        self.assertEqual(person_b._state.adding, False)
>> +        self.assertEqual(person_b.id, person_a.id)
>> +
>> +        # Single Cc
>> +        mail = self._create_email(munged_sender, [], [real_sender])
>> +        person_b = get_or_create_author(mail, project)
>> +        self.assertEqual(person_b._state.adding, False)
>> +        self.assertEqual(person_b.id, person_a.id)
>> +
>> +        # Multiple Reply-Tos and Ccs
>> +        mail = self._create_email(munged_sender, [other_email, real_sender],
>> +                                  [other_email, other_email])
>> +        person_b = get_or_create_author(mail, project)
>> +        self.assertEqual(person_b._state.adding, False)
>> +        self.assertEqual(person_b.id, person_a.id)
>> +
>> +    def test_google_dmarc_munging(self):
>> +        project = create_project()
>> +        real_sender = 'Existing Sender <existing at example.com>'
>> +        munged_sender = "'Existing Sender' via List <{}>".format(
>> +            project.listemail)
>> +
>> +        # Unmunged author
>> +        mail = self._create_email(real_sender)
>> +        person_a = get_or_create_author(mail, project)
>> +        person_a.save()
>> +
>> +        # X-Original-From header
>> +        mail = self._create_email(munged_sender, None, None, real_sender)
>> +        person_b = get_or_create_author(mail, project)
>> +        self.assertEqual(person_b._state.adding, False)
>> +        self.assertEqual(person_b.id, person_a.id)
>> +
>>  
>>  class SeriesCorrelationTest(TestCase):
>>      """Validate correct behavior of find_series."""
>> diff --git a/patchwork/views/utils.py b/patchwork/views/utils.py
>> index d65aa5816336..905807935459 100644
>> --- a/patchwork/views/utils.py
>> +++ b/patchwork/views/utils.py
>> @@ -18,6 +18,7 @@ from django.utils import six
>>  
>>  from patchwork.models import Comment
>>  from patchwork.models import Patch
>> +from patchwork.parser import split_from_header
>>  
>>  if settings.ENABLE_REST_API:
>>      from rest_framework.authtoken.models import Token
>> @@ -92,6 +93,17 @@ def _submission_to_mbox(submission):
>>          # [1] https://tools.ietf.org/html/rfc1847
>>          if key == 'Content-Type' and val == 'multipart/signed':
>>              continue
>> +
>> +        if key == 'From':
>> +            name, addr = split_from_header(val)
>> +            if addr == submission.project.listemail:
>> +                # If From: is the list address (typically DMARC munging), then
>> +                # use the submitter details (which are cleaned up in the
>> +                # parser) in the From: field so that the patch author details
>> +                # are correct when applied with git am.
>> +                mail['X-Patchwork-Original-From'] = val
>> +                val = mail['X-Patchwork-Submitter']
>> +
>>          mail[key] = val
>>  
>>      if 'Date' not in mail:
>> -- 
>> 2.20.1
>>
>> _______________________________________________
>> Patchwork mailing list
>> Patchwork at lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/patchwork


More information about the Patchwork mailing list