Coderific

rss feed

blog - of umlauts, unicode, and busted rss feeds

posted by witten on September 10, 2007

It came to my attention today that multiple Coderific rss feeds have been busted for quite a while. I really should've noticed earlier, given that I'm subscribed to a few of the feeds myself and none of them have been updating in my rss reader, even though new ratings have been pouring in on the site.

The specific bug turned out to be a CherryPy problem in which one of the handlers barfed unceremoniously on unicode strings (http://www.cherrypy.org/ticket/511). Since Coderific deals with everything in unicode, all that it took to trigger this bug was for someone to enter an employer rating with an umlaut, and then boom, no rss feeds. The fix was to apply one of the CherryPy patches described in that ticket.

The moral of this story is threefold. First, I need more automated tests. Not just unit tests, but end-to-end functional tests for things like umlauts breaking HTML generation. That's really the sort of thing that can be tested ahead of time.

Second, I need to keep a better eye on the site. I shouldn't assume that everything is humming along just because I haven't heard about any problems from Coderific's users. In fact, just this morning I thought to myself, "Why, Coderific is quite an engineering achievement! It seems to keep on running with very little maintenance." I suppose the gods of irony heard that thought and took the matter into their own hands. And I wouldn't mind at all if you, dear Coderific user, emailed me if you notice anything going even slightly awry on this site.

And finally, unicode in the current versions of Python is like a second-class citizen. Everything defaults to 7-bit ASCII by default, and you have to go to great lengths to make a program unicode-aware. Even if you do make the program use unicode strings throughout, as is the case with Coderific, you've still got to worry about all the libraries you're depending on, and how well they play with unicode. I'm really looking forward to Python 3, in which all strings will be unicode by default.

So if you suddenly received a deluge of Coderific rss updates in your reader today, now you know why: Insufficient tests, attention, and unicode!

0 comments

Write a comment!