I have this little project that’s been 95% finished for months, ‘blogbot 2’, a python script which monitors a list of RSS/Atom feeds daily and sends out an email with all the new ones to a mailing list. Blogbot 1 was a ruby script where I rolled my own feed parser using Ruby’s built-in XML parser. Ruby’s parser rocks, but my wielding of it was only 90% good enough, so I wanted to leverage somebody else’s work and rewrite it in Python using the Universal Feed Parser.
I kept stumbling over Unicode issues. The UFP is so correct that it often returns Unicode, and Python’s unicode handling, while very good if you know how to use it, is pretty sucky if you are me. The default handling for unicode-to-string conversion is to use the ‘ascii’ encoding, which can’t handle practically anything, and the default behavior when it hits something it can’t handle is to throw up its arms and scream in mute horror. You cannot globally change these defaults; you have to catch it conversion by conversion and change them.
I finally got it all figured out and it’s working great. But I tore hair out over that last 5%.
Another project I’m working on is “artblog/imageblog” software. I want something which does no more nor less than I want it to, and does it well and easily. I’m hacking this one together in PHP/MySQL, and it’s been fun learning PHP. (Hint to anyone going to use PHP/MySQL: use Pear DB if you possibly can. It will rock your world. MUCH better API than the default mysql stuff.)
Anyway, I’m doing the artblog thing very graduall, and that’s turning out to be a good thing. Cause it means I don’t spend too long going in the wrong direction. The big trick with PHP/MySQL is to decide what ought to be a PHP issue and what ought to be a MySQL issue. Originally I had this concept where I was going to have thumbnail images listed in the same table as fullsize images, and have a many-to-one relation between them via an extra column in that table that related back to itself. (This was actually my second iteration — originally thumb images were in their own table with a many to one relationship to the main table… this was while I was still thinking I would ever actually care to have multiple thumbnails for the same image, which I decided in the end I never would.)
Anyway, I ended up deciding “screw all that — I’m going to have thumbnails be created automagically if they don’t already exist via PHP magic, and the database isn’t going to know or care about them.” I coded it up and it’s awesome.
The thing is, I had all of these ideas in the time I wasn’t working on the project. I’d work on it for a while and then do other things (cause I don’t have that much spare time for it) for a day or three at a time. Then by the time I’d come back I’d have had a lot of time for those “standing there in the shower and suddenly realize how the system ought to be designed” moments.
This project is going to be better than it would have been because it’s gradual. All these people are on about “rapid/agile development” — well, I think development is more agile when it’s less rapid. Because if you move fast, then you get a lot coded the way you first thought of doing it, which is seldom the best way to do it.
This is not a friendly world for gradual activity though, and the programming world is especially unfriendly to gradual development.
Ah well.
I finally got it all figured out and it’s working great. But I tore hair out over that last 5%.
Great! I could use exactly this. Where can I get it?
I threw a tarball together: http://www.goesping.org/blogbot.tgz
Tell me if it works for you.
I stumbled across a lot of python unicode problems when writing various screenscrapers, and sometimes ended up throwing in a few regexps to get rid of the data that just refused to convert. One of these days I’ll learn to use unicode properly, and convince my server admins to set python up to be a little more flexible in that regard.
How are you finding the performance of PHP drawing images from a database and generating thumbnails on the fly? I tend to store images and thumbnails in particular folders, named for a database ID, but I’ve often wondered about storing all the data in MySQL.
Oh, that’s not what I’m doing. It seems to me that databases are for textual information, not for binary data — I know it’s *possible* to store binary data in a database, but it just seems like asking for trouble to me. So I have a directory for fullsize images, and when something is uploaded the file is copied to that directory, and an entry in an ‘images’ table is created with that filename (hmm… I don’t check for filename clashes. I should.) When a thumbnail is needed (by PHP) for an image, PHP checks to see if there is a thumbnail in the ‘thumbs’ directory with the same name as the image in the ‘images’ directory, and a later creation date. If there isn’t, it creates one using gd. So the thumbnail creation is, in effect, dynamic-but-cached rather than purely dynamic.
I’m not sure how dreamhost.com feels about big piles of binary data in their MySQL databases, but I suspect many hosting services would consider it rude.
I kind of like the idea of naming a folder after a database id and keeping an image and its thumbnail in it though. I’ll think about whether that solution would have any advantages over what I’ve got.
My messy syntax is at fault there — the images are named after the database ID rather than the folders. The other could work, I guess.
Would blogbot also be able to send the new stuff to a jabber user instead of an email list? Surely Python must haver jabber stuff available.
For my fullsize/thumb issues, I usually have the fullsize file have a name something like myimage.jpg and the thumb have a name like myimage_t.jpg. Then in the database my image field holds “myimage”. Then in your php you can tack on a _t if you want the thumb, and not if you want fullsize.