[PATCH] pwclient: fix handling of UTF-8 char in submitter name
jk at ozlabs.org
Mon Dec 13 18:17:24 EST 2010
> I don't see much problem for the parser, as it runs on a server, but I
> agree that a lighter environment at the client side is interesting. Yet,
> it is better to install some additional python packages locally than to
> loose patches.
The parser may be run on client machines currently, to generate patch hashes.
I think the django utils are fairly thin wrappers around the standard unicode
objects though, so we should be alright with what's in the vanilla python
> > The reason that I don't do this currently is that patchwork would now be
> > altering your patches to something that the author didn't write. If you
> > were to apply the resulting patch, you would be introducing the U+FFFD
> > character to your source tree.
> > However, dropping patches isn't a great solution either, so other
> > alternatives welcome :)
> Would it be possible to handle the error at decode with "try"? If so, maybe
> you could add some logic there to try to decode first with the email
> charset. Then, try utf-8. If both fails, try to decode with some other
> protocols, like iso8859-11. This will likely catch 99% of the issues. If
> everything fails, it is preferred to use the replacement character than to
> loose the patch.
> I would also add a meta-tag to inticate the cases where patchwork is
> guessing a type (or using a replacement character). This way, the
> maintainer may manually take care of the fixes.
That sounds pretty reasonable. For cases like these, I'd like to add
'warnings' to the patch; either a 'had to guess the charset' or 'invalid
encoding', depending on what we had to do to get a sucessful parse. The
warnings would then appear in the web UI, or on stderr when running pwclient.
*adds to the TODO list*
More information about the Patchwork