[PATCH] parser: Fix parsing of pull request emails with CRLF line endings on Python 2

Daniel Axtens dja at axtens.net
Fri Jan 19 12:58:38 AEDT 2018


Stephen Finucane <stephen at that.guru> writes:

> On Tue, 2018-01-09 at 23:56 +0000, Stephen Finucane wrote:
>> On Tue, 2018-01-09 at 12:01 +1100, Andrew Donnellan wrote:
>> > On 09/01/18 11:56, Daniel Axtens wrote:
>> > > > diff --git a/patchwork/parser.py b/patchwork/parser.py
>> > > > index 1568bc4..7c677db 100644
>> > > > --- a/patchwork/parser.py
>> > > > +++ b/patchwork/parser.py
>> > > > @@ -666,9 +666,13 @@ def clean_content(content):
>> > > >       """Remove cruft from the email message.
>> > > >   
>> > > >       Catch signature (-- ) and list footer (_____) cruft.
>> > > > +
>> > > > +    Change to Unix line endings (the Python 3 email module
>> > > > does
>> > > > this for us,
>> > > > +    but not Python 2).
>> > > >       """
>> > > >       sig_re = re.compile(r'^(-- |_+)\n.*', re.S | re.M)
>> > > >       content = sig_re.sub('', content)
>> > > > +    content = content.replace('\r\n', '\n')
>> > > 
>> > > Shouldn't this go before the removal of signatures?
>> > 
>> > Good point
>> 
>> Pending this change, this looks good to me. I'll leave the actual
>> applying to Daniel though, in case he has more comments.
>> 
>> Reviewed-by: Stephen Finucane <stephen at that.guru>
>
> As an aside, we could also just open files with universal newlines [1]
> in the parse_mail/parse_archive commands. Not sure if there are any
> advantages to doing this (would you ever have reason to mix CRLF and
> LF?).
>

My only thought would be that we should treat emails as magical black
boxes and let Python's stdlib handle them for us. Mucking with universal
newlines is a step backwards there.

Regards,
Daniel

> Stephen
>
> [1] https://docs.python.org/3/glossary.html#term-universal-newlines


More information about the Patchwork mailing list