Files
swift/test/unit/common/test_constraints.py
Samuel Merritt 331b14238e Reject object names with Unicode surrogates
Technically, you can't encode surrogates into UTF-8 at all, but Python
2 lets you get away with it. Python 3 does not.

We already have a check for surrogate pairs (commit 0080337), but not
one for lone surrogates. This commit forbids object names with lone
surrogates in them.

The problem with surrogates is trivially reproducible:

    swift@saio:~$ python2.7
    Python 2.7.3 (default, Feb 27 2014, 19:58:35)
    [GCC 4.6.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> b'\xed\xa0\xbc'.decode('utf-8')
    u'\ud83c'
    >>>

    swift@saio:~$ python3.3
    Python 3.3.5 (default, Aug  4 2014, 15:27:24)
    [GCC 4.6.3] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> b'\xed\xa0\xbc'.decode('utf-8')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte
    >>>

See also http://bugs.python.org/issue9133

Change-Id: I7c31022e8a028c3cdf2ed1586349509d96cfded9
2014-11-07 14:01:22 -08:00

25 KiB