<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Matthew, <div><br><blockquote type="cite"><span class="Apple-style-span" style="font-family: Arial, Helvetica, 'Luxi Sans', sans-serif; font-size: 14px; white-space: pre; "><div style="padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I'm using vobject to parse data from a couple of CSV files and merge them to<br>vcards, eventually to import into Mac Address Book. One of the files is ASCII,<br>the other UTF-8. The UTF-8 file is survey data that I'm just munging and<br>putting into the Note field.
</div></span></blockquote><div><br></div><div>To avoid (or at least front-load) encoding issues, when working with text, you want to always decode into unicode when you read the text, and encode only when writing back to text.</div><div><br></div><div>It sounds to me like you aren't doing the decode step. If you're opening your UTF-8 file using open, just use codecs.open. A good explanation of using codecs.open rather than the builtin open is at:</div><div><br></div><div><a href="http://stackoverflow.com/questions/491921/unicode-utf8-reading-and-writing-to-files-in-python">http://stackoverflow.com/questions/491921/unicode-utf8-reading-and-writing-to-files-in-python</a></div><div><br></div><div>Since you're working with CSV, though, you're probably fighting with the fact that Python's csv module doesn't support unicode. Take a look at:</div><div><br></div><div><a href="http://docs.python.org/library/csv.html#csv-examples">http://docs.python.org/library/csv.html#csv-examples</a></div><div><br></div><div>specifically unicode_csv_reader, you probably want to use something like that to wrap your inbound csv data.</div><br><blockquote type="cite"><span class="Apple-style-span" style="font-family: Arial, Helvetica, 'Luxi Sans', sans-serif; font-size: 14px; white-space: pre; "><div style="padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I think I understand the problem: vcard is an ASCII-only specification, and the<br>serialize method can't encode the unicode text as ASCII.
</div></span></blockquote><div><br></div><div>Actually, RFC2426 is all about unicode, not ASCII. VCard 2.1 used quoted-printable for encoding, which makes older vcards a pain in the ass to parse, but it sounds like you're writing your own vcards, so you shouldn't be messing with vcard 2.1.</div><div><br></div><div>RFC2426 *is* agnostic about how you encode your unicode. Vobject tries to decode UTF-8, UTF-16, and even iso-8859-1, but when it serializes it always encodes as UTF-8 (I ought to make the encoding optional and let the user specify the encoding, but there hasn't been much of a clamor for anything but UTF-8).</div><br><blockquote type="cite"><span class="Apple-style-span" style="font-family: Arial, Helvetica, 'Luxi Sans', sans-serif; font-size: 14px; white-space: pre; "><div style="padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I tried quoted-printable for the encoding, but Address Book didn't decode it</div><div style="padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">(left it as quoted-printable gobbledygook). Then I tried binary, and Address<br>Book crashed on import.<br></div></span></blockquote><br></div><div>Always assign unicode, not UTF-8 encoded data, to the value of ContentLines, and you shouldn't see an encode error (that specific error is happening when the utf-8 encoder encounters a non-unicode value and naively tries to convert it to unicode by decoding as ascii).</div><div><br></div><div>Sincerely,</div><div>Jeffrey</div></body></html>