[PING][PATCH] Fallback to common charsets when charset is None or x-unknown
Siddhesh Poyarekar
siddhesh at redhat.com
Mon Jun 30 16:03:33 EST 2014
Hi,
Ping!
On Thu, Jun 12, 2014 at 01:04:46AM +0530, Siddhesh Poyarekar wrote:
> Trying again after signing up to the mailing list (patch is slightly
> modified from my first submission, which may either be in moderation
> or may have gotten lost somehow):
>
> On Wed, Jun 11, 2014 at 04:09:16PM +0530, Siddhesh Poyarekar wrote:
> > Hi,
> >
> > We recently encountered a case in our glibc patchwork instance on
> > sourceware, where a patch was dropped because it had x-unknown
> > charset. I used the following patch to fix this in our instance. The
> > fix I used was to fall back on a set of encodings (instead of just
> > utf-8) when the charset is not mentioned or if it is set as x-unknown.
> >
> > I hope this is useful. I'd love to know if you all think there is a
> > better way to fix this so that I can implement that in our instance
> > instead of my hack.
> >
> > Cheers,
> > Siddhesh
>
> --- a/apps/patchwork/bin/parsemail.py 2014-06-11 15:53:12.685666812 +0530
> +++ b/apps/patchwork/bin/parsemail.py 2014-06-11 15:53:03.991667186 +0530
> @@ -147,6 +147,13 @@
> return match.group(1)
> return None
>
> +def try_decode(payload, charset):
> + try:
> + payload = unicode(payload, charset)
> + except UnicodeDecodeError:
> + return None
> + return payload
> +
> def find_content(project, mail):
> patchbuf = None
> commentbuf = ''
> @@ -157,15 +164,27 @@
> continue
>
> payload = part.get_payload(decode=True)
> - charset = part.get_content_charset()
> subtype = part.get_content_subtype()
>
> - # if we don't have a charset, assume utf-8
> - if charset is None:
> - charset = 'utf-8'
> -
> if not isinstance(payload, unicode):
> - payload = unicode(payload, charset)
> + charset = part.get_content_charset()
> +
> + # If there is no charset or if it is unknown, then try some common
> + # charsets before we fail.
> + if charset is None or charset == 'x-unknown':
> + try_charsets = ['utf-8', 'windows-1252', 'ascii', 'iso-8859-1']
> + else:
> + try_charsets = [charset]
> +
> + for cset in try_charsets:
> + decoded_payload = try_decode(payload, cset)
> + if decoded_payload is not None:
> + break
> + payload = decoded_payload
> +
> + # Could not find a valid decoded payload. Fail.
> + if payload is None:
> + return (None, None)
>
> if subtype in ['x-patch', 'x-diff']:
> patchbuf = payload
> _______________________________________________
> Patchwork mailing list
> Patchwork at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/patchwork/attachments/20140630/8921c3d5/attachment.sig>
More information about the Patchwork
mailing list