[PATCH] parser: Fix parsing of pull request emails with CRLF line endings on Python 2
Daniel Axtens
dja at axtens.net
Fri Jan 19 12:58:38 AEDT 2018
Stephen Finucane <stephen at that.guru> writes:
> On Tue, 2018-01-09 at 23:56 +0000, Stephen Finucane wrote:
>> On Tue, 2018-01-09 at 12:01 +1100, Andrew Donnellan wrote:
>> > On 09/01/18 11:56, Daniel Axtens wrote:
>> > > > diff --git a/patchwork/parser.py b/patchwork/parser.py
>> > > > index 1568bc4..7c677db 100644
>> > > > --- a/patchwork/parser.py
>> > > > +++ b/patchwork/parser.py
>> > > > @@ -666,9 +666,13 @@ def clean_content(content):
>> > > > """Remove cruft from the email message.
>> > > >
>> > > > Catch signature (-- ) and list footer (_____) cruft.
>> > > > +
>> > > > + Change to Unix line endings (the Python 3 email module
>> > > > does
>> > > > this for us,
>> > > > + but not Python 2).
>> > > > """
>> > > > sig_re = re.compile(r'^(-- |_+)\n.*', re.S | re.M)
>> > > > content = sig_re.sub('', content)
>> > > > + content = content.replace('\r\n', '\n')
>> > >
>> > > Shouldn't this go before the removal of signatures?
>> >
>> > Good point
>>
>> Pending this change, this looks good to me. I'll leave the actual
>> applying to Daniel though, in case he has more comments.
>>
>> Reviewed-by: Stephen Finucane <stephen at that.guru>
>
> As an aside, we could also just open files with universal newlines [1]
> in the parse_mail/parse_archive commands. Not sure if there are any
> advantages to doing this (would you ever have reason to mix CRLF and
> LF?).
>
My only thought would be that we should treat emails as magical black
boxes and let Python's stdlib handle them for us. Mucking with universal
newlines is a step backwards there.
Regards,
Daniel
> Stephen
>
> [1] https://docs.python.org/3/glossary.html#term-universal-newlines
More information about the Patchwork
mailing list