#python uses #ascii instead of #unicode by default #gotcha

by default python seems to run in ascii which means if you try and process a string with a utf-8 unicode character in it you will get the error

UnicodeEncodeError: 'ascii' codec can't encode character 
u'\xXX' in position YYYYY: ordinal not in range(128)

so to force your script to use utf-8 unicode instead you need to

see http://nedbatchelder.com/text/unipain.html and

import sys
reload(sys)
sys.setdefaultencoding('utf8')
print sys.getdefaultencoding()
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s