[PATCH] parser: Fix parsing of pull request emails with CRLF line endings on Python 2
Andrew Donnellan
andrew.donnellan at au1.ibm.com
Fri Jan 19 10:38:57 AEDT 2018
On 19/01/18 00:24, Stephen Finucane wrote:
> On Tue, 2018-01-09 at 23:56 +0000, Stephen Finucane wrote:
>> On Tue, 2018-01-09 at 12:01 +1100, Andrew Donnellan wrote:
>>> On 09/01/18 11:56, Daniel Axtens wrote:
>>>>> diff --git a/patchwork/parser.py b/patchwork/parser.py
>>>>> index 1568bc4..7c677db 100644
>>>>> --- a/patchwork/parser.py
>>>>> +++ b/patchwork/parser.py
>>>>> @@ -666,9 +666,13 @@ def clean_content(content):
>>>>> """Remove cruft from the email message.
>>>>>
>>>>> Catch signature (-- ) and list footer (_____) cruft.
>>>>> +
>>>>> + Change to Unix line endings (the Python 3 email module
>>>>> does
>>>>> this for us,
>>>>> + but not Python 2).
>>>>> """
>>>>> sig_re = re.compile(r'^(-- |_+)\n.*', re.S | re.M)
>>>>> content = sig_re.sub('', content)
>>>>> + content = content.replace('\r\n', '\n')
>>>>
>>>> Shouldn't this go before the removal of signatures?
>>>
>>> Good point
>>
>> Pending this change, this looks good to me. I'll leave the actual
>> applying to Daniel though, in case he has more comments.
>>
>> Reviewed-by: Stephen Finucane <stephen at that.guru>
>
> As an aside, we could also just open files with universal newlines [1]
> in the parse_mail/parse_archive commands. Not sure if there are any
> advantages to doing this (would you ever have reason to mix CRLF and
> LF?).
Yeah, I did think about that, but on Py3 we currently open the files in
binary mode and I wasn't sure whether that was going to break anything.
Let me see...
--
Andrew Donnellan OzLabs, ADL Canberra
andrew.donnellan at au1.ibm.com IBM Australia Limited
More information about the Patchwork
mailing list