POPFile Revisited

Filed under:General — eric @ 5:16 am

Back on the first of November, I told you about POPFile, an email classifier/spam filter that uses math algorithms from the 1700s. Now that three months have gone by, I thought I’d let you know how it’s worked.

I can safely say that this is the best thing that has ever happened to my email. You probably know that I’ve been on the ‘net for a long time, and I’ve never shied away from putting my email address out there. This means that I get a lot of spam. A Lot. And over the years, I’ve tried every spam filter at my disposal, and none were all that satisfactory. And then along came POPFile.

I’ve got quite a few email accounts, and for simplicity, they all eventually end up on the same computer. I try to have different accounts for different purposes, such as theater, farm, work, etc., but many of my friends overlap those purposes, so sorting by incoming address isn’t very reliable. POPFile handles that, too.

In POPFile, you create as many buckets as you want to sort your email. I’ve got six: farm, personal, spam, townandgown (my theater), work, and yahoolists (which actually is a general-purpose mailing list bucket). After you create your buckets, you feed them email as it comes in, and they quickly learn what emails to grab. The email then gets labeled (either by prepending the bucket name to the subject or adding a custom email header, depending on your email reader’s capabilities and your wishes). So, when you finally read it, it’s labeled and then sorted by your reader. Even with these six, somewhat fuzzily defined, buckets, POPFile has been very accurate:

Just over 98% — and most of the errors would have been hard for a human to classify correctly, such as when a co-worker sends me a theater-related email, for example. It was over 99% until a new version of POPFile came out this week. It has a better algorithm, but it had to re-learn a few emails to get things straight again. So, for over 9000 emails coming in, 177 got mis-labeled. A couple of those were “false-positives”, but they are extremely rare. Here’s how many were sent to each bucket:

All those emails, and 41% of them were spam. That’s several thousand trash emails over the last few months wasting my resources. I’ve got my email reader automatically deleting them, so I never have to see them any more. Yay!

The system works by building a dictionary of words, assigning probabilities to each so it can compile an overall probability that a given piece of email goes in a specific bucket. It takes a surprisingly small dictionary to achieve these results:

My work bucket has over a million words in it, but almost all of them are from a humongous attachment containing computer code that I accidentally fed it the first day I was using the system. The others range from about 8000 words up to 41000 words. That’s really not very many feedings. And when it gets something wrong, I correct it (using a very simple web interface), and that refines the dictionary accordingly.

POPFile is platform independent and requires only that you get your email through a POP account — so it won’t work on web-based systems like Yahoo mail, for instance. To make things even simpler for windows users, there is an exe installer available that does everything but configure your email reader and set up buckets, but the step-by-step instructions will guide you from there.

You might not be getting as much spam as I am (though it probably feels like it sometimes), but if you get any at all, POPFilter is worth it. It’s impossible for spammers to fool (unlike most of the other filters out there), so if enough people had systems like this in place, maybe the spammers would finally give it up.


  1. Hey Eric

    So glad this is working so well
    for you.

    POPFile’s going to be quite tuff
    for the spammers to work-around.

    Eventually, when someone’s selling
    you crap, they’ve got to tell you
    that. Bingo, plonk, you’ve shown
    yourselve, you’re ID’d, you’re gone.

    It is so NICE to get one’s Inbox back



    Comment by Stan Krute — 2/13/2003 @ 1:17 am

  2. I just have one thing to say…I have a great business opportunity for you adding inches while losing weight and growing real hair…

    Comment by M — 2/13/2003 @ 3:18 am

  3. Spam bucket her, Eric! Yeah! Do it! Spam bucket! Spam bucket! Spam bucket!

    Comment by Matt — 2/14/2003 @ 9:09 am

  4. As you know, the more buckets you use with POPFIle, the greater chance you have for classification errorw. I’m happy to report that I currently have 18 buckets in use with my 6 email accounts. And, with this, I have a success rate of over 97%. It’s amazing how well it works. I don’t get a lot of SPAM, but it works really well as an email classification tool.

    Comment by Mike J. — 3/4/2003 @ 9:52 am

RSS feed for comments on this post.

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>