« All change | Main | Commuter Blogging: The Return »

UTF Hell

Well the posts are now imported. The following perl one-liner was a lifesaver:

perl -C -pe 's/([^\x00-\x7f])/sprintf("&#%d;", ord($1))/ge;'

Converts non-ascii to XML numeric entity references. The MT XMLRPC daemon wasn't to keen on accepting files with UTF-8 chars (although that was probably the fault of the commandline poster I'm using...)

Oneliner was found at: http://www.cl.cam.ac.uk/~mgk25/unicode.html#perl

TrackBack

TrackBack URL for this entry:
http://blog.s8n.net/mt/mt-tb.cgi/982

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on August 19, 2006 11:19 PM.

The previous post in this blog was All change.

The next post in this blog is Commuter Blogging: The Return.

Many more can be found on the main index page or by looking through the archives.