[PATCH] pwclient: fix handling of UTF-8 char in submitter name

Mauro Carvalho Chehab mchehab at redhat.com
Mon Dec 13 13:03:14 EST 2010


Em 12-12-2010 22:58, Jeremy Kerr escreveu:
> Hi Mauro,
> 
>> I never used it, nor I am a python expert, but it sems that django defines
>> a class of lazy utf decoders that won't cause python to crash due to a
>> string that it is not following the proper encoding:
>> 	http://docs.djangoproject.com/en/dev/ref/unicode/
>>
>> I had one interesting case of a patch with a driver from staging being
>> changed/moved to another place, with a string inside using a non-utf8.
>> Patchwork simply discarded this patch. I only noticed it because this were
>> patch 6 of a sequence of patches, so I went to the ML to double check what
>> were missing.
> 
> The parser (and pwclient) need to be fairly independent of django, as they're 
> both intended to be run on machine with a fairly minimal python environment.

I don't see much problem for the parser, as it runs on a server, but I agree
that a lighter environment at the client side is interesting. Yet, it is better to
install some additional python packages locally than to loose patches.
 
> 
> However, the unicode decoder has a 'replace'-mode, where invalid byte 
> sequences are replaced with U+FFFD REPLACEMENT CHARACTER:
> 
>   '\x80'.decode('utf-8', 'replace') = '\ufffd'

Interesting.
 
> The reason that I don't do this currently is that patchwork would now be 
> altering your patches to something that the author didn't write. If you were 
> to apply the resulting patch, you would be introducing the U+FFFD character to 
> your source tree.
> 
> However, dropping patches isn't a great solution either, so other alternatives 
> welcome :)

Would it be possible to handle the error at decode with "try"? If so, maybe you could
add some logic there to try to decode first with the email charset. Then, try utf-8. 
If both fails, try to decode with some other protocols, like iso8859-11. This will
likely catch 99% of the issues. If everything fails, it is preferred to use the
replacement character than to loose the patch. 

I would also add a meta-tag to inticate the cases where patchwork is guessing a
type (or using a replacement character). This way, the maintainer may manually 
take care of the fixes.

Cheers,
Mauro


More information about the Patchwork mailing list