[PATCH] pwclient: fix handling of UTF-8 char in submitter name

Jeremy Kerr jk at ozlabs.org
Mon Dec 13 18:17:24 EST 2010


Hi Mauro,

> I don't see much problem for the parser, as it runs on a server, but I
> agree that a lighter environment at the client side is interesting. Yet,
> it is better to install some additional python packages locally than to
> loose patches.

The parser may be run on client machines currently, to generate patch hashes. 

I think the django utils are fairly thin wrappers around the standard unicode 
objects though, so we should be alright with what's in the vanilla python 
install.

> > The reason that I don't do this currently is that patchwork would now be
> > altering your patches to something that the author didn't write. If you
> > were to apply the resulting patch, you would be introducing the U+FFFD
> > character to your source tree.
> > 
> > However, dropping patches isn't a great solution either, so other
> > alternatives welcome :)
> 
> Would it be possible to handle the error at decode with "try"? If so, maybe
> you could add some logic there to try to decode first with the email
> charset. Then, try utf-8. If both fails, try to decode with some other
> protocols, like iso8859-11. This will likely catch 99% of the issues. If
> everything fails, it is preferred to use the replacement character than to
> loose the patch.
> 
> I would also add a meta-tag to inticate the cases where patchwork is
> guessing a type (or using a replacement character). This way, the
> maintainer may manually take care of the fixes.

That sounds pretty reasonable. For cases like these, I'd like to add 
'warnings' to the patch; either a 'had to guess the charset' or 'invalid 
encoding', depending on what we had to do to get a sucessful parse. The 
warnings would then appear in the web UI, or on stderr when running pwclient.

*adds to the TODO list*

Cheers,


Jeremy


More information about the Patchwork mailing list