[PATCH] Fallback to common charsets when charset is None or x-unknown

Siddhesh Poyarekar siddhesh at redhat.com
Sat Jul 12 03:09:26 EST 2014


Hi,

Ping!

Siddhesh

On Fri, Jul 04, 2014 at 09:43:23PM +0530, Siddhesh Poyarekar wrote:
> We recently encountered a case in our glibc patchwork instance on
> sourceware, where a patch was dropped because it had x-unknown
> charset.  I used the following patch to fix this in our instance.  The
> fix I used was to fall back on a set of encodings (instead of just
> utf-8) when the charset is not mentioned or if it is set as x-unknown.
> 
> v2 removes ascii as a fallback since it won't work anyway if utf-8
> failed.
> 
> Signed-off-by: Siddhesh Poyarekar <siddhesh at redhat.com>
> ---
>  apps/patchwork/bin/parsemail.py | 31 +++++++++++++++++++++++++------
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/apps/patchwork/bin/parsemail.py b/apps/patchwork/bin/parsemail.py
> index b6eb97a..7c173d9 100755
> --- a/apps/patchwork/bin/parsemail.py
> +++ b/apps/patchwork/bin/parsemail.py
> @@ -147,6 +147,13 @@ def find_pull_request(content):
>          return match.group(1)
>      return None
>  
> +def try_decode(payload, charset):
> +    try:
> +        payload = unicode(payload, charset)
> +    except UnicodeDecodeError:
> +        return None
> +    return payload
> +
>  def find_content(project, mail):
>      patchbuf = None
>      commentbuf = ''
> @@ -157,15 +164,27 @@ def find_content(project, mail):
>              continue
>  
>          payload = part.get_payload(decode=True)
> -        charset = part.get_content_charset()
>          subtype = part.get_content_subtype()
>  
> -        # if we don't have a charset, assume utf-8
> -        if charset is None:
> -            charset = 'utf-8'
> -
>          if not isinstance(payload, unicode):
> -            payload = unicode(payload, charset)
> +            charset = part.get_content_charset()
> +
> +            # If there is no charset or if it is unknown, then try some common
> +            # charsets before we fail.
> +            if charset is None or charset == 'x-unknown':
> +                try_charsets = ['utf-8', 'windows-1252', 'iso-8859-1']
> +            else:
> +                try_charsets = [charset]
> +
> +            for cset in try_charsets:
> +                decoded_payload = try_decode(payload, cset)
> +                if decoded_payload is not None:
> +                    break
> +            payload = decoded_payload
> +
> +            # Could not find a valid decoded payload.  Fail.
> +            if payload is None:
> +                return (None, None)
>  
>          if subtype in ['x-patch', 'x-diff']:
>              patchbuf = payload
> -- 
> 1.9.3
> 



> _______________________________________________
> Patchwork mailing list
> Patchwork at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/patchwork/attachments/20140711/92eeff74/attachment.sig>


More information about the Patchwork mailing list