UTF-8-challenged checkpatch on patchwork
konstantin at linuxfoundation.org
Wed Dec 2 05:56:07 AEDT 2020
On Tue, Dec 01, 2020 at 10:36:43AM -0800, Jakub Kicinski wrote:
> > Not sure if this is a bug in requests or not, but the corruption is
> > introduced in .text. Here's POC:
> > ----
> > #!/usr/bin/env python3
> > import requests
> > pmbx = 'https://email@example.com/mbox/'
> > res = requests.get(pmbx)
> > print('content: ' + res.content.decode().split('\n'))
> > print(' text: ' + res.text.split('\n'))
> > ----
> > $ python3 test.py
> > content: Signed-off-by: Toke Høiland-Jørgensen <toke at redhat.com>
> > text: Signed-off-by: Toke HÃ¸iland-JÃ¸rgensen <toke at redhat.com>
> Ah, great! Thank you! I pushed out a patch, I will put it in prod after
> the mid-day rush is over.
Following up, this is not a bug in requests, but a bug in patchwork.
Here are the headers it sends:
$ wget -S https://firstname.lastname@example.org/mbox/
--2020-12-01 13:45:02-- https://email@example.com/mbox/
Resolving patchwork.kernel.org (patchwork.kernel.org)... 184.108.40.206
Connecting to patchwork.kernel.org (patchwork.kernel.org)|220.127.116.11|:443... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Tue, 01 Dec 2020 18:45:03 GMT
Content-Disposition: attachment; filename=net-inet_ecn-Fix-endianness-of-checksum-update-when-setting-ECT-1.patch
The HTTP/1.1 standard defines the default character encoding as
iso-8859-1, so response.text is not wrong. Patchwork should be setting
the correct encoding in the header to be utf-8:
Content-Type: text/plain; charset=utf-8
Since it is, in fact, always utf-8:
I don't have time to submit a proper patch, so I'm just cowardly cc'ing
the patchwork list in hopes that either someone does it for me, or that
I remember to do it at some later point. :)
More information about the Patchwork