iTunes parsing and Unicode characters

Ok, so this maybe not work related; but something I’ve had in mind for a while.

My situation is, I’m going to pick up an iMac as soon as Steve Jobs annonce the new update to the already 331+ days old current iMac (only apple can get away with selling almost 1 year old hardware for the same price as when it first came out). I’m migrating from Windows to OS X and really want to keep my copy of iTunes (50 gb of music and junk, and all the work I’ve placed into organizing it); Luckily for me, iTunes is managed by an XML file. Sure.. I could do a search and replace on the xml file and call it a day, but what fun would that be?

So I’m writing as a iTunes xml parser, I figure it wouldn’t be very hard at all, seeing as I’ve done a ton of xml parsing and testing for Online Marking Tool already.

for OLM, we used a package called PyXML to handle our XML parsing, so thats what I did, used PyXML and  try parsing it through that route… unfortunately for me, I didn’t realized that PyXML is not being updated anymore, and worse of all,  its SLOOOOW….. I guess for OLM, it doesn’t matter as much as we don’t really handle really large XML files (My iTunes Library file is 4 mb btw). Parsing my iTunes library took an amazing 6 minutes, and the python thread took up over 1 gb of ram to do it… thats pretty dam inefficient.

I got about 3 hours of playing around and I realized, I better figure out a way to setup SVN on my macbook or else I’m going to be hating myself later on (seeing as version control saved my butt many times when working with OLM).

I stumbled upon this:

Which was a very useful guide to helping me work my way around setting up my SVN repository, 10 minutes later, version 1 of ‘iTunes Migration Tool’ has been created!

So it was lost, PyXML route lead me to a dead end, but I’m not one to give u easily, so I searched up another library. Python has a built in python xml library which is much… MUCH better I found out. parsing my macbook’s file took 4 seconds, much happier.

I also ran into another issue, changing the paths of the xml file, most of my tracks are in chinese. Lucky for me, the python library already does things in Unicode, all i had to do, was find out what encoding OS X uses to read chinese characters. The answer is UTF-16.

So now, I think i’m already done, just need to write the part of the code that will copy the files over to my external hard drive :), I’ll probably touch up and finish up on it and post it somewhere sometime 🙂



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: