Saturday, May 10, 2003
I'm actually in a really bad mood, so I'm working on "unsolvable" problems to get my mind off things.
For a good portion of the week, I've been thinking through the problem of grouping emails reliably into threads. That means turning all those various "Re:Re: You are a putz" and "Fwd:I thought this was funny but I have a poor sense of humor" chains of emails into their own groups of messages. So the original "You are a putz" message has all of it's child responses grouped under it (at least in the data itself: not necessarily in your view of the emails). It's a surprisingly difficult problem, since all the various mail software handles even such basic functionality as replying to a message differently, and none of the popular email clients (Microsoft products in particular) follow the long-established specifications for what information should go where in an email's envelope (bet you didn't know email had envelopes, huh? that's where all the really good stuff is hidden, like the subject of your email). So to come up with a solution that can handle even most kinds of email messages is pretty difficult. Every time your store of email changes - like you get new mail- you have to go through all of your mails and rebuild a sort of "mental map" of what messages are connected to other messages.
This tends to get complex very quickly. For a mailing list or newsgroup service, it's much easier since you see pretty much all sides of a conversation. With an email client program, however, you are usually missing parts of it (your outgoing email, usually), and dealing with that is also complicated.
After a lot of searching with no end in sight I actually happened across [this small paper] on the subject, which also happens to be part of a draft to [extend IMAP to handle threads]. There is a slightly easier to read explanation of the algorithm [here]. Though it was hard for me to find, thank god that [jwz]([[blog]) was able to make it public. Unfortunately, it's pretty difficult to implement, and the Grendel implementation he wrote is pretty far from being a standalone object or class, it has some odd dependancies that make it difficult to read and work with. So I now don't have to think about how to build threads so much, but I do have to implement someone else's deeply nested process, which ain't fun.
There's a good reason that I'm implementing what some people would think is superfluous functionality fairly early in the development of an experimental portion of the UI. Yeah, say that 10 times fast. Every discussion is important, and the links between messages are often more important than the conversation itself. Rarely does an email message make any sense when it's alone. Taken from the context of the discussion it's a component of greatly reduces it's value. And that said, every mail message window that's part of a larger conversation will have a display readily available that can take you to the other parts of the conversation. So it's quite a bit like any message on [cocoa.mamasam.com].
The other problem I am working on is People, or the "Peeps" map. Some email clients include a "PIM" or Personal Information Manager. These were all the buzz a few years back, but they're really just rolodexes. Two contact managers that stand apart from things that are glorified Rolodexes (like the MacOS X AddressBook) are [SBook5] and [Six Degrees]. SBook5 uses AI to build your contact database. Six Degrees scans your email (unfortunately, only working with Entourage/Outlook, not even IMAP) and builds a database from that, mapping connections between contacts.
Remember when I said the links between things are more often more important than the things themselves?
Creating a map of associations between contacts is important. Creating one that's fast to search is important for EvilToaster, since things like auto-completion of addresses is nice to have. Six Degrees builds a map based on the idea of "Six Degrees of Separation". A [project] at Columbia is trying to find out if the Six Degrees of Separation idea is true, but for our purposes it might as well be. The social networks we form through electronic communication are important not only because of the people forming individual nodes, but because of the shapes the network takes. The funny thing is, these kinds of maps of associations are very much like the kinds of data structures I used to work with in 3D, and leveraging my experience with 3D meshes is helping this along greatly.
That doesn't make it lot easier though. It's still very hard work, and a lot of though has to go into it. A lot of people would ask why I'm not using some kind of relational database for these things- relational databases, when actually being used for more than glorified hash tables, are actually slow and ugly and heavy. I looked at using [Prevayler], but the jury is still out on that in my mind. That could easily just be used as a post-step for serializing the data.
[ 5/10/2003 12:37:00 AM ] [