Interesting email spam statistics

So I was looking at PopFile‘s statistics page.

Messages Classified

Bucket Classification Count False Positives False Negatives
other 0 (0.00%) 0 0
personal 34,557 (89.30%) 60 94
spam 3,978 (10.28%) 79 108
work 39 (0.10%) 6 2
unclassified 122 (0.31%) 59

This is really interesting. Of my entire email volume of 38,596 emails (and this doesn’t include Gmail!) only 3,978 were spam – or about 10% of incoming messages are spam. But the word counts are more interesting:

Bucket
Name
Distinct
Words
other 0
personal 10,113
spam 10,313
work 699
unclassified

Notice the distinct words in Personal and Spam are only off by 200?

But the best part:

Classification Accuracy

Messages classified: 38,696
Classification errors: 204

Accuracy: 99.47%

99.47% – so .53% error rate. Not too bad. I use PopFile and it uses a Bayesian methodology to classify messages. See the link above and you can download and configure it.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s