Friday, February 21, 2003

And now for a short rant on filters...

Six or so months ago, a lot of attention was given to using Bayesian networks for filtering spam after Paul Graham posted this article, [A Plan For Spam] and it was linked to from slashdot. Inevitably, within hours or days there were dozens of new sourceforge projects devoted to Bayesian spam filters in different languages, APIs, etc., some with implementations and some with only daydreams.

I guess some of those guys don't get that this is statistical text classification. The idea here is that as you read mail, you have a "This is Junk" button that you hit when you see spam, over time the system collects information that helps it identify spam messages and it gets better and better at keeping you from seeing them in the first place.

Hi! I don't know about you, but I get mail other than spam too. Why aren't they using it for more than spam? Why don't they use it to identify flames from usenet, or receipts from online purchases, or work vs. personal email?
It's not a big step to take. It's no giant leap from IDing spam to IDing email that requires immediate attention. You would think someone but me had made it by now.
A good quick laugh