[PATCH v2 09/10] parsemail: Convert to a management command

Sun Aug 28 17:06:05 AEST 2016

> +    def handle(self, *args, **options):
> +        # Attempt to parse the path if provided, and fallback to stdin if not
> +        if args:
> +            logger.info('Parsing mail loaded by filename')
> +            with open(args[0]) as file_:
> +                mail = message_from_file(file_)
> +        else:
> +            logger.info('Parsing mail loaded from stdin')
> +            mail = message_from_file(sys.stdin)
> +

So, I have found an interesting case here, not strictly related to this
patch but related to parsing messages from files.

I have been testing with some messages from this list from earlier this
month. One [0] includes the following sequence:

000018f0  69 65 73 20 76 69 65 77  29 20 3f c2 a0 20 48 6f  |ies view) ?.. Ho|

Note the sequence "c2 a0". Both these are > 128 and therefore not part
of 7-bit ASCII.

Apparently this is a UTF-8 for a non-breaking space:
http://stackoverflow.com/a/2774507/463510

email.message_from_file does not handle this well: it boils down to

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 6395: ordinal not in range(128)

I imagine this hasn't hit us in production because most (all?)
production users use Python2, which doesn't have the bytes/string
distinction that Python3 has.

Anyway, the only way I've found to work around this is to do something
like this:

with open(args[0], 'rb') as file_:
     decoded_mail = file_.read().decode('utf-8')
     mail = email.message_from_string(decoded_mail)

This is super ugly, but works in Py3. Ironically it doesn't work in Py2,
but it's a start. Could you include something like this in this patch
set? I think the parsearchive will require something similar too.

I'm going to start collecting these "interesting" emails to make a test suite.

Regards,
Daniel

[0] https://lists.ozlabs.org/pipermail/patchwork/2016-August/003158.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 859 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/patchwork/attachments/20160828/bd31f3b6/attachment-0001.sig>