[PATCH] pwclient: fix handling of UTF-8 char in submitter name

Jeremy Kerr jk at ozlabs.org
Mon Dec 13 11:58:19 EST 2010


Hi Mauro,

> I never used it, nor I am a python expert, but it sems that django defines
> a class of lazy utf decoders that won't cause python to crash due to a
> string that it is not following the proper encoding:
> 	http://docs.djangoproject.com/en/dev/ref/unicode/
> 
> I had one interesting case of a patch with a driver from staging being
> changed/moved to another place, with a string inside using a non-utf8.
> Patchwork simply discarded this patch. I only noticed it because this were
> patch 6 of a sequence of patches, so I went to the ML to double check what
> were missing.

The parser (and pwclient) need to be fairly independent of django, as they're 
both intended to be run on machine with a fairly minimal python environment. 

However, the unicode decoder has a 'replace'-mode, where invalid byte 
sequences are replaced with U+FFFD REPLACEMENT CHARACTER:

  '\x80'.decode('utf-8', 'replace') = '\ufffd'

The reason that I don't do this currently is that patchwork would now be 
altering your patches to something that the author didn't write. If you were 
to apply the resulting patch, you would be introducing the U+FFFD character to 
your source tree.

However, dropping patches isn't a great solution either, so other alternatives 
welcome :)

Cheers,


Jeremy


More information about the Patchwork mailing list