Tainted Glass

Spam Wars

A few weeks ago, I was getting about 20 spam messages a day, but they all had a weird character set. As a temporary fix, I set up a filter that immediately trashed any email that contained the words "Gµarántêe", "sêxûaÌ", or "Ênlárgëmeñt".

The results were a stunning success, my spam levels dropped to about 5 per week. I was ecstatic, having won a little battle against the spammers.

Sadly though, victory was fleeting. The spam is back, though a different type of spam. For those who are spam-prone, you may have noticed that spammers now include random words at the bottom of their message, for example, at the bottom of my latest spam was the following text:

reach worth montmartre alberta proclamation teacup cruddy candid disembowel hair missy verge bellwether waistline encyclopedic defector reprise chagrin rapt wail gooseberry manor airtight codeposit stellar wreathe garibaldi cliffhang flintlock wondrous punish cellar sensuous wizard chairwomen salad ebb marlborough chunk chose boar huntsville consonantal recompense moonlit

Fascinating stuff. Relatively low level protection of course, I have seen some spam messages that place actual randomly generated stories at the bottom of the spam message. This is probably done to combat a spam-filtering technique known as Bayesian Filtering. Essentially, this type of filter looks at words that are common in the email you get and words that are common in spam that you get, and assigns a "spam-likelihood" probability to each message. For example, a word like "unsubscribe" would be very spammy, but a word like "Diplomacy" would be very unspammy for me in particular, since much of my regular email contains that word.

(I should point out that the linked article above is long, and a bit technical at the beginning, but just ignore the algorithms, its fairly interesting stuff).

When I first read the article of Bayesian Filtering, it sounded like the ultimate panacea, but my ardor was rapidly cooled in this chilling analysis:

This is where the unparalleled strength of the Bayes-type filtering becomes the very makings of unmitigated disaster, as the second-order effects come into play. Consider the question What does it mean for an spam email to make it past a Bayes filter?

The proper answer to that is a mathematical one, but in English, it means that there are no obvious cues that the message is spam. Nothing obvious in the title, nothing obvious in the text, no key words used only in spam, nothing.

He then gives an example of the type of spam we should expect to see shortly:

That's a nice point, but I think you should consider the information at http:/\/www.somewebsite.com/info.html before going with that approach. I found that information to be really pertinent

A message like that cannot be filtered by any antispam software, since it looks exactly like a real message. No key words to hit, nothing to distinguish it from normal email.

But that's not even the scary part. As Jeremy points out, the truly horrifying aspect of this inevitable evolution of spam is that humans will have a hard time recognizing it as spam. Right now we can see which messages are spam almost immediately and delete them, but imagine if that became impossible?

In some doomsday scenarios, spam will spin out of control within a year or two. Many solutions have been discussed, one of the more reasonable ones being a charge for sending email. ISP's could even make the first 10,000 email messages a month free to subscribers, and that would still stop spammers cold in their tracks.

I think that a charge for sending email is inevitable. The only question is: will we be forced to suffer through a "dark age" of email before this change occurs? I certainly hope not.

Tainted Glass

About Me

Thursday, April 15, 2004

0 Comments: