[PATCH] parser: Fix parsing of pull request emails with CRLF line endings on Python 2

Stephen Finucane stephen at that.guru
Fri Jan 19 00:24:51 AEDT 2018


On Tue, 2018-01-09 at 23:56 +0000, Stephen Finucane wrote:
> On Tue, 2018-01-09 at 12:01 +1100, Andrew Donnellan wrote:
> > On 09/01/18 11:56, Daniel Axtens wrote:
> > > > diff --git a/patchwork/parser.py b/patchwork/parser.py
> > > > index 1568bc4..7c677db 100644
> > > > --- a/patchwork/parser.py
> > > > +++ b/patchwork/parser.py
> > > > @@ -666,9 +666,13 @@ def clean_content(content):
> > > >       """Remove cruft from the email message.
> > > >   
> > > >       Catch signature (-- ) and list footer (_____) cruft.
> > > > +
> > > > +    Change to Unix line endings (the Python 3 email module
> > > > does
> > > > this for us,
> > > > +    but not Python 2).
> > > >       """
> > > >       sig_re = re.compile(r'^(-- |_+)\n.*', re.S | re.M)
> > > >       content = sig_re.sub('', content)
> > > > +    content = content.replace('\r\n', '\n')
> > > 
> > > Shouldn't this go before the removal of signatures?
> > 
> > Good point
> 
> Pending this change, this looks good to me. I'll leave the actual
> applying to Daniel though, in case he has more comments.
> 
> Reviewed-by: Stephen Finucane <stephen at that.guru>

As an aside, we could also just open files with universal newlines [1]
in the parse_mail/parse_archive commands. Not sure if there are any
advantages to doing this (would you ever have reason to mix CRLF and
LF?).

Stephen

[1] https://docs.python.org/3/glossary.html#term-universal-newlines


More information about the Patchwork mailing list