Proposed HACKING guidelines for string encoding.

Change-Id: Ifc120e33f08868ead8b02320dc982f5528db4965
This commit is contained in:
Andrew Bogott 2012-03-13 18:11:49 -05:00
parent bfcd962d97
commit 204ffabe38

40
HACKING
View File

@ -73,3 +73,43 @@ Docstrings
:returns: description of the return value
"""
Text encoding
----------
- All text within python code should be of type 'unicode'.
WRONG:
>>> s = 'foo'
>>> s
'foo'
>>> type(s)
<type 'str'>
RIGHT:
>>> u = u'foo'
>>> u
u'foo'
>>> type(u)
<type 'unicode'>
- Transitions between internal unicode and external strings should always
be immediately and explicitly encoded or decoded.
- All external text that is not explicitly encoded (database storage,
commandline arguments, etc.) should be presumed to be encoded as utf-8.
WRONG:
mystring = infile.readline()
myreturnstring = do_some_magic_with(mystring)
outfile.write(myreturnstring)
RIGHT:
mystring = infile.readline()
mytext = s.decode('utf-8')
returntext = do_some_magic_with(mytext)
returnstring = returntext.encode('utf-8')
outfile.write(returnstring)