[Vobject] unicode problem
    Jeffrey Harris 
    jeffrey at osafoundation.org
       
    Wed May 28 14:33:44 CDT 2008
    
    
  
Hi Anil,
> BEGIN:VCARD
> VERSION:2.1
> N;CHARSET=UTF-8:M.Sc.;Beno\303\256t Lef\303\251vre,
> FN;QUOTED-PRINTABLE:Beno=EEt Lef=E9vre, M.Sc.
> EMAIL;PREF;INTERNET:someone at something.com
> END:VCARD
> 
> 
> Since the data is in utf-8, I wanted to decode to Unicode so I do:
> name = vcard.fn.value.encode('utf-8')
> 
> but that is throwing
> <type 'exceptions.UnicodeDecodeError'>: 'ascii' codec can't decode
> byte 0xee in position 4: ordinal not in range(128)
> 
> 
> I am still new to Unicoding, so I might've missed something obvious. Thanks!
You're doing everything right, this is bug 9814,
https://bugzilla.osafoundation.org/show_bug.cgi?id=9814
Just to be clear for posterity reading the mailing list, when you cut 
and pasted the vCard into email, you cut and pasted utf-8 octets as 
their escaped representations (\303\256, for instance), but I know what 
you meant :)
There's a patch submitted by a helpful user for this problem, I'm 
working on (belatedly) committing it now.
One thing to note: vobject by default expects to receive more modern 
streams then VCARD 2.1, (VCARD 3.0 or VCALENDAR 2.0), so it ignores the 
CHARSET parameter, instead it hopes the entire stream is unicode, if 
it's not, it tries to decode as (in this order): 'utf-8', 'utf-16-LE', 
'utf-16-BE', 'iso-8859-1'.
However, after I commit the fix for bug 9814, you'll be able to pass an 
allowQP flag to readOne and readComponents, which turns on slower state 
machine parsing instead of regular expressions (necessary to handle 
quoted-printable).  When this flag is passed in as True, individual 
content-line's charset parameters will be used for decoding values into 
unicode, defaulting to iso-8859-1.
Sincerely,
Jeffrey
    
    
More information about the VObject
mailing list