I have a folder full of old mail in Apple’s Mail application. It’s gigantic. About 70,000 messages. Most of them are duplicates, because it’s the result of finding old folder of mail upon old folder of mail and merging them together into one great hoard. The actual number of real distinct messages is probably a smallish fraction of 70,000.
What’s worse, some of the 70,000 are blank. In an inept attempt at writing a Python script to clean up a similar uber-mail-folder in the past, I somehow took a lot of old mail and destroyed the body of the emails, leaving the headers intact. So my gigantic folder includes many duplicates, but some of the duplicates aren’t real duplicates because they have missing bodies.
I want to somehow eliminate all the duplicate messages, and there are scripts to do that in Apple Mail. The only one that I would have trusted not to accidentally kill a real message and keep the one without a body, chokes and fails on a folder that large. (It also choked and failed on a smaller folder. Maybe something changed in Leopard that breaks that script.)
So I wanted to go through and destroy all the messages which have blank bodies — they’re no use to me and they make it dangerous to get rid of duplicate messages. I tried exporting everything to a mbox-format file, and use some of Python’s nice mailbox-manipulation libraries, but the file was insanely large, and Python on my macbook staggered under its weight. (Besides, my use of Python caused this problem, a while back…)
So eventually I turned to AppleScript. (I first tried using rb-appscript, but it turns out I don’t need any special Rubyness for this, and it’s easier to learn from examples of AppleScript on the web if I don’t have to translate them into Ruby before I use them.)
I wrote a script in Apple’s Script Editor called “Winnower.” It takes messages in a folder called “doing” and sorts them into two folders, “blank” and “done,” depending on whether there’s any content in the body or attachments on the mail. I put a few thousand messages at a time into the “doing” folder and then run the script. (The full weight of the 70,000+ message folder was too much for this script too.)
It looks like this:
tell application “Mail”
set doingbox to mailbox “doing”
set blankbox to mailbox “blank”
set donebox to mailbox “done”
set doingmessages to messages of doingbox
repeat with thisMessage in doingmessages
ignoring white space
if mail attachments of thisMessage is {} and content of thisMessage is equal to “” then
move thisMessage to blankbox
else
move thisMessage to donebox
end if
end ignoring
end repeat
end tell