[PATCH 08/10] parse(mail|archive): handle early fail within email module

Daniel Axtens dja at axtens.net
Thu Jun 29 00:06:51 AEST 2017


Andrew Donnellan <andrew.donnellan at au1.ibm.com> writes:

> On 28/06/17 17:48, Daniel Axtens wrote:
>> Certain really messed up email messages can cause a failure within
>> the email module (at least on py3). Catch this.
>>
>> Signed-off-by: Daniel Axtens <dja at axtens.net>
>> ---
>>  patchwork/management/commands/parsearchive.py |  9 ++++++++
>>  patchwork/management/commands/parsemail.py    | 31 ++++++++++++++++-----------
>>  2 files changed, 27 insertions(+), 13 deletions(-)
>>
>> diff --git a/patchwork/management/commands/parsearchive.py b/patchwork/management/commands/parsearchive.py
>> index a3c8360186c8..3aab58a45bcd 100644
>> --- a/patchwork/management/commands/parsearchive.py
>> +++ b/patchwork/management/commands/parsearchive.py
>> @@ -77,6 +77,15 @@ class Command(BaseCommand):
>>
>>          count = len(mbox)
>>
>> +        # detect broken mails in the mbox
>> +        # see earlyfail fuzz test on py3
>> +        try:
>> +            for m in mbox:
>> +                pass
>> +        except AttributeError:
>> +            logger.warning('Broken mbox/Maildir, aborting')
>> +            return
>> +
>
> This seems a bit non-obvious and could do with a little bit of explanation?

The message, or the code structure? I structured the code this way
rather than the more obvious
 try:
   mbox = [m for m in mbox]
 ...
because the more obvious way requires loading the entire mbox/maildir
into memory and I was a bit worried about the memory consumption of that
when parsing a large mbox.

I agree a more helpful comment would have been in order. Stephen, do you
want a v2 of this patch by itself? I can resend the series but it seems
a bit excessive... Or I could do a follow-up.

Regards,
Daniel

>
>>          logger.info('Parsing %d mails', count)
>>          for i, msg in enumerate(mbox):
>>              try:
>> diff --git a/patchwork/management/commands/parsemail.py b/patchwork/management/commands/parsemail.py
>> index 9adfb25b09e3..52ec8bc56899 100644
>> --- a/patchwork/management/commands/parsemail.py
>> +++ b/patchwork/management/commands/parsemail.py
>> @@ -58,20 +58,25 @@ class Command(base.BaseCommand):
>>      def handle(self, *args, **options):
>>          infile = args[0] if args else options['infile']
>>
>> -        if infile:
>> -            logger.info('Parsing mail loaded by filename')
>> -            if six.PY3:
>> -                with open(infile, 'rb') as file_:
>> -                    mail = email.message_from_binary_file(file_)
>> -            else:
>> -                with open(infile) as file_:
>> -                    mail = email.message_from_file(file_)
>> -        else:
>> -            logger.info('Parsing mail loaded from stdin')
>> -            if six.PY3:
>> -                mail = email.message_from_binary_file(sys.stdin.buffer)
>> +        try:
>> +            if infile:
>> +                logger.info('Parsing mail loaded by filename')
>> +                if six.PY3:
>> +                    with open(infile, 'rb') as file_:
>> +                        mail = email.message_from_binary_file(file_)
>> +                else:
>> +                    with open(infile) as file_:
>> +                        mail = email.message_from_file(file_)
>>              else:
>> -                mail = email.message_from_file(sys.stdin)
>> +                logger.info('Parsing mail loaded from stdin')
>> +                if six.PY3:
>> +                    mail = email.message_from_binary_file(sys.stdin.buffer)
>> +                else:
>> +                    mail = email.message_from_file(sys.stdin)
>> +        except AttributeError:
>> +            logger.warning("Broken email ignored")
>> +            return
>> +
>>          try:
>>              result = parse_mail(mail, options['list_id'])
>>              if result:
>>
>
> -- 
> Andrew Donnellan              OzLabs, ADL Canberra
> andrew.donnellan at au1.ibm.com  IBM Australia Limited


More information about the Patchwork mailing list