Michael Barton 0080337897 reject problematic object names
We had this problem:

    >> : x = '\xed\xa0\xbc\xed\xbc\xb8'
    >> : x == x.decode('utf-8').encode('utf-8')
    << : False

That str contains two utf-8 codepoints, which I guess python is normalizing
into one unicode character, which it then encodes as one utf-8 codepoint.
Like this:

    >> : u'\ud83c\udf38'
    << : u'\U0001f338'

I don't entirely understand that, but having a different byte representation
after round-tripping through unicode causes problems with replication and
listings.

This patch just rejects anything that doesn't re-encode to the same thing.
If someone smarter wants to do something different, please speak up.

Change-Id: I9ac48ac2693e4121be6585c6e4f5d0079e9bb3e4
2014-10-27 16:29:07 +00:00
..
2014-10-27 16:29:07 +00:00
2014-09-18 21:18:50 -07:00