[RFC PATCH] pwclient: Force xmlrpc client to return unicode strings
Stephen Finucane
stephen at that.guru
Tue May 23 20:08:43 AEST 2017
On Mon, 2017-05-22 at 11:37 +0200, Robin Jarry wrote:
> On python 2, the reference implementation of the XML-RPC unmarshaller
> decodes strings to unicode with the selected encoding (utf-8 by
> default) but it tries to re-encode the unicode strings to ascii bytes
> before returning the values. If it fails, it leaves the value as
> unicode.
>
> See these links for more details:
>
> https://hg.python.org/cpython/file/2.7/Lib/xmlrpclib.py#l878
> https://hg.python.org/cpython/file/2.7/Lib/xmlrpclib.py#l180
>
> https://hg.python.org/cpython/file/3.6/Lib/xmlrpc/client.py#l753
>
> Override the Transport.getparser() method to return a subclass of
> unmarshaller that does not re-encode to ascii but preserves the
> unicode strings intact. This allows to have similar behaviour in both
> python 2 and python 3.
>
> Signed-off-by: Robin Jarry <robin.jarry at 6wind.com>
> ---
> patchwork/bin/pwclient | 42
> ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 42 insertions(+)
>
> diff --git a/patchwork/bin/pwclient b/patchwork/bin/pwclient
> index 5fcb0844b923..d3e3d9cde134 100755
> --- a/patchwork/bin/pwclient
> +++ b/patchwork/bin/pwclient
> @@ -97,6 +97,28 @@ class Filter(object):
> return str(self.d)
>
>
> +if xmlrpclib.FastUnmarshaller is not None:
> + Unmarshaller = xmlrpclib.FastUnmarshaller
> +else:
> + Unmarshaller = xmlrpclib.Unmarshaller
> +
> +class UnicodeUnmarshaller(Unmarshaller):
> +
> + dispatch = Unmarshaller.dispatch.copy()
> +
> + def end_string(self, data):
> + if self._encoding:
> + data = xmlrpclib._decode(data, self._encoding)
> + # the python 2.7 reference implementation tries to re-encode
> to
> + # ascii bytes here but leaves unicode if it fails, do not
> try to
> + # re-encode to ascii byte string
Thanks for doing this, Robin :) The patch looks pretty good as-is.
However, based on what you've said, all this hassle traces back to the
the '_stringify' function [1]. I wonder if we could simplify things by
merely monkey patching that function instead? Something like the below
_could_ do the job, I'd imagine?
if sys.version_info[0] < 3:
def _stringify(string):
return string
xmlrpclib._stringify = _stringify
Other than the fact that we're messing with private methods (we're
consenting adults here), does this looks sensible? Any thoughts on how
this compares?
Cheers,
Stephen
[1] https://github.com/python/cpython/blob/2.7/Lib/xmlrpclib.py#L181-L1
86
PS: I noticed this didn't get picked up by Patchwork. I wonder why?
> + self.append(data)
> + self._value = 0
> +
> + dispatch['string'] = end_string
> + dispatch['name'] = end_string
> +
> +
> class Transport(xmlrpclib.SafeTransport):
>
> def __init__(self, url):
> @@ -132,6 +154,26 @@ class Transport(xmlrpclib.SafeTransport):
> handler = '%s://%s%s' % (self.scheme, self.host,
> handler)
> xmlrpclib.Transport.send_request(self, connection,
> handler,
> request_body)
> + def getparser(self):
> + # copied from Python 2.7 Lib/xmlrpclib.py to support our
> custom
> + # UnicodeUnmarshaller
> + if xmlrpclib.FastParser and xmlrpclib.FastUnmarshaller:
> + if self._use_datetime:
> + mkdatetime = xmlrpclib._datetime_type
> + else:
> + mkdatetime = xmlrpclib._datetime
> + target = UnicodeUnmarshaller(True, False,
> xmlrpclib._binary,
> + mkdatetime,
> xmlrpclib.Fault)
> + parser = xmlrpclib.FastParser(target)
> + else:
> + target =
> UnicodeUnmarshaller(use_datetime=self._use_datetime)
> + if xmlrpclib.FastParser:
> + parser = xmlrpclib.FastParser(target)
> + elif xmlrpclib.ExpatParser:
> + parser = xmlrpclib.ExpatParser(target)
> + else:
> + parser = xmlrpclib.SlowParser(target)
> + return parser, target
> else: # Python 3
> def send_request(self, host, handler, request_body, debug):
> handler = '%s://%s%s' % (self.scheme, host, handler)
More information about the Patchwork
mailing list