Interesting email spam statistics

So I was looking at PopFile‘s statistics page.

Messages Classified

Bucket Classification Count False Positives False Negatives
other 0 (0.00%) 0 0
personal 34,557 (89.30%) 60 94
spam 3,978 (10.28%) 79 108
work 39 (0.10%) 6 2
unclassified 122 (0.31%) 59

This is really interesting. Of my entire email volume of 38,596 emails (and this doesn’t include Gmail!) only 3,978 were spam – or about 10% of incoming messages are spam. But the word counts are more interesting:

other 0
personal 10,113
spam 10,313
work 699

Notice the distinct words in Personal and Spam are only off by 200?

But the best part:

Classification Accuracy

Messages classified: 38,696
Classification errors: 204

Accuracy: 99.47%

99.47% – so .53% error rate. Not too bad. I use PopFile and it uses a Bayesian methodology to classify messages. See the link above and you can download and configure it.


