I have a pet peeve, it's people thinking that my situation is the same as theirs, and giving me advice that is wrong for me.
I get around 2400 spam a day. I've tried three different spam filters recommended to me by people I trust and who generally know about these things, and they get about 97% to 98% accuracy. Worse, they produce false positives. With 50 spam per day, plus having to trawl through the spam bin to look for the false positives, I decided to write my own.
My spam filter is highly tuned to my traffic and I get about 10 spam through per day. More, there are about 2 false positives per month, although that's very hard to quantify.
The problem with putting up a form is that people want a reply, and yet they can't type their own email address properly. About half the emails I get through my various forms have subtle and not-so-subtle misspellings of the return address, and it can cost me hours to track them down.
No, obfuscated email addresses is still my best tool in this situation.
I'm using Gmail for my domain and received zero spams in well over a year. My email address is all over the web. What are you using for your spam filter?
I don't bother obfuscating either. I do get spam occasionally, but the gmail algorithm learns so once I mark a particular type of spam as spam it never comes up again.
I am less worried about receiving spam and more worried about not receiving good mails. Are you sure that Gmail's spam filter has zero false positives as well? If you are sure, how do you know? I can not check my spam folder manually because it contains thousands of mails.
At one point I was checking it religiously but I've come to trust it after never finding a legitimate mail in my spam folder after a long period of time. I still check it from time to time, and again, I don't find legitimate mail tagged as spam.
Most confirmation mails (and the like) end up in my Gmail junk folder. E-Mails from normal people never. And for me it has just become a automatic process to look in the junk folder for those.
I can't use gmail because I have the requirement to create email addresses on-the-fly. I have my own domain and can do that. The result is I need good spam filtering.
My filtering now achieves around 99.6% filtering, and about 1 detected false positive per month. It would be interesting to see how gmail copes with my 2400 spam per day, and what accuracy it achieves, but it's a non-starter because of how I use email.
I do this all the time with Google Apps to track who sells my email address, creating a new one for each place I signup.
1. Create a catch-all address for the domain you're going to use that isn't the normal postmaster one.
2. Pick a three- or four-letter combination of letters that rarely appears in normal conversation (like dcj, for example).
3. Set a mail filter on the catch-all account to forward all mail that has your three-letter combination as a part of its recipient list to your real address.
This means that for every service you sign up for, you can create a new address (I always use domainname.code@mydomain) that is trackable and gets to you. For example, I just signed up with Via Rail's online system - using the address viarail.dcj@mydomain. It will get forwarded to my real address (since it's got the dcj in there), if I start getting spam on it I'll know where they got the email address from, and if it gets really bad I can just change my filters to block all email to viarail.dcj@mydomain.
You could do something with Gmail's plus-addressing, but I find that many services don't accept those email addresses.
Good idea. For those of us who use an @gmail.com address, here's what you can use in lieu of plus signs, which as JimmyL said often don't work:
Gmail lets you insert periods between the letters of your username. So if my email address is someusername@gmail.com, the following are valid variants of my email address:
some.username@gmail.com
some.user.name@gmail.com
s.omeusername@gmail.com
And so on. So you can use variants for different services you sign up for. Also note that you can substitute @googlemail.com for @gmail.com.
Yahoo Mail offers a service called AddressGuard that does exactly this, but it's only available to paid accounts. You can also easily use these as your from address, so even in direct correspondence, you still shield your primary address from the recipient. (This allowed me to verify that eMusic sells their subscriber list to spammers.)
A free alternative is SneakEmail (http://www.sneakemail.com) which allows you to set up disposable addresses that forward to your primary account, and allows you to set up pre-forwarding filters. They also create a unique address for the sender of each email, and you can set up your SneakEmail filters to insert this as the reply-to address of each email you receive.
SneakEmail is great; it's what I use when I'm seriously suspicious of who I'm sending mail to.
For 95% of the things I sign up for, however, I'm not that paranoid - the slight decrease in security is offset by the added convenience of not having to log into a third-party service (like SneakEmail) to get what I'm doing done.
I use a custom domain, but have it set up to forward all addresses to Gmail, mainly to use their spam filter. I'm not sure what your particular use-case for the on-demand addresses is, but you might be able to set up some Gmail label rules to sort it for you on that end too.
The only downside is that you have to manually verify your custom domain's 'sent from' addresses in Gmail, so you can't easily reply from arbitrary addresses.
Your visitors that can't spell their own email address is likely to screw your obfuscated address up as well. I've seen email in my outbound queue to whatever@domain.dot.tld and variations. I wouldn't be surprised if someone never realizes that AT is @ (in non-English speaking locales, @=at is not at all obvious. In the Nordic languages it's known as (elephant's) trunk-A. ) and gives up before the mail leaving the client.
It hides your e-mail behind a CAPTCHA, but once it's solved it's an actual, well-formed address, that can be copied or clicked. Also, it's very clear if you've solved the CAPTCHA - it's no always clear if solved the "riddle" of an obfuscation correctly.
> ... visitors that can't spell their own email address
> ... likely to screw your obfuscated address up as well
> ... it's no always clear if solved the "riddle" of an
> obfuscation correctly.
They are likely to screw it up, but at least they get the feedback of a bounced email. There is a recovery mode, and it's their problem.
With an incorrectly spelled return address on a form there is no recovery mode at all, and they get no feedback that it hasn't worked.
Isn't that the point of the article? That you are offloading work onto the peopel you ostensibly want to get in touch with you?
(I'm not saying it's wrong for you to do so, just that your statement seems to line up perfectly with the author's premise that obfuscating your email address is taking your problem and making it their problem).
The author suggests that using a form makes the problem go away. My experience says that people who can't de-obfuscate an email address frequently can't type their return address correctly. The obfuscated address gives them feedback in the form of a bounced email, and thereby gives a recovery mode. The incorrect return address in the form gives no recovery mode at all.
To me, that's conclusive proof that the form is worse than the obfuscated address.
Further, there's a balance to be achieved. I've analysed the types of people I want to contact me, and of those who can't work out my address there are two types. One type is those that I really don't care about - nuisance, spam, content-free, or time-wasters. The others that I do care about usually have a different route to me, one that's specifically tailored to them and made as easy as possible. That one has a specially designed anti-spam measure built into it.
I have to agree with your disdain for forms. I hope that when given the choic ebetween obfuscating an email address and providing a form, our response will be to improve our spam filtering and offer a plaintext mailto: link.
Plaintext links have high usability. The mail is organized within the user's mail program where it can be retrieved, collated by subject, and so forth. It can be copied and pasted. It's the abslute best thing for them :-)
> but at least they get the feedback of a bounced email.
First, that depends. They may hit an existing domain with a catch-all mailbox configured, or they may hit a legitimate mailbox at your provider, where the receiver may or may not think to reply that they got the wrong address.
Second, what percentage of users (especially in the segment that might misspell their own email address) will know what to do with a bounce mail?
The author cannot be bothered to decode such email addresses. If that is the case I probably did not want his email in the first place. Ipso facto, my filter worked.
You are EXACTLY right - we display our jobs email address this way. We figure if someone can't figure out an email address that is presented this way, then we don't want to hire them.
I've used it for years with good results. People will even email me thinking that I've exposed my email address to spammers encouraging me to use the blah [at] blah dot com style.
I always thought this sort of systematic obfuscation ( /@/at/s, /\./dot/s, etc) was as machine-readable as an actual email address. I've just gone on the assumption that there are spam harvesters out there using regexes that catch "bob at domain dot com" as well as bob@domain.com.
Given the large number of blog, wiki and CMS engines which use the same cargo-cult security idea, it's probably considerably higher than 10%. If you're getting paid to harvest addresses, wouldn't you write a single regexp to increase the number of good addresses?
Sure I would, but like most half-way decent programmers, you couldn't pay me enough to code spam bots.
I've blocked a huge volume of comment spam on my sites by blocking certain malformed HTTP headers. The authors couldn't be bothered to check if they were getting it right. I don't think most spam bot authors are A) very well paid or B) very good.
Don't forget that it's a dynamic system. As more of the low-hanging fruit emails get picked by spam email harvesters, then there is more value in the harder to decode emails since they haven't been spammed. There is a tipping point where it would be "worth it" for someone to start to decode the harder other types of obfuscated emails.
Possibly, but couldn't you argue that the low-hanging fruit email addresses are more likely to be profitable to spammers? Which of these two internet users is more likely to buy your replica rolex: danny@aol.com, or AOL: danny (or danny at aol, or danny+don'tspamme at teh a oh l's dot com)?
My point is that users smart enough to disguise their emails from spammers are more likely to be wary of their wares.
I don't think it even makes sense to harvest these addresses. The people who write email [AT] address [dot] com are MUCH less likely to purchase their Viagra online in you shop just because you send them an email:)
Every technique reduced spam by 60% or more, and even the dumb-as-rocks replacement of '@' and '.' with 'AT' and 'DOT' reduced the volume by size by over 99%.
Based on that analysis, javascript "scrambling" is the preferable method: It is 100%(1) effective and has no usability implications.
1: The analysis runs 1.5 years until July 2008 -- one must assume that crawlers has become more sophisticated. I.e. building the DOM, executing any javascript and then searching all visible text isn't that difficult, and less so now than in 2007.
It's still far from trivial if you're doing this on millions of pages, especially as you'll have to sandbox the JS in some way, which may or may not subtly break things in other ways. I suspect the effort isn't worth it.
Email harvesters probably don't want to run javascript because then they would be open to traps (like infinite loops or other cpu consuming scripts) that could be targeted at them.
There are ways to trick bots. 4chan used to have a second field named 'email' in their submission form that was hidden with the value set to "DO NOT PUT ANYTHING HERE" (or something similar) and spammers would blindly fill both email fields (unless someone was specifically targeting 4chan). I'll bet there are plenty of ways to get an email-crawler to click on some link that a normal person would not.
Such traps can be placed on decoy pages that users and good robots are unlikely to visit. Or, that legitimate users execute rarely -- when clicking a link to send a single legitimate email -- but email harvesters execute in excess.
Especially obfuscations like "bob at gmail dot com". If I were a spambot I would just read until the first space and append "@gmail.com" and "@hotmail.com" and 5-6 other mail providers. This would break through 90% of people doing that, and they're high value targets too - they feel secure with their clever obfuscation and chances are their other anti-spam tools (and reflexes) are weaker.
But you're not a spam bot author. Based on what I've seen, most spam bot authors are pretty bad programmers.
Also, I would debate that these are high value targets. I'd wager that people who go out of their way to obscure their addresses are much less likely to purchase fake pills or fall for a phishing email than the average user.
With your method, you would have to get an email address after every word: in your own words "i would just read until the first space and append". At the first space ("bob "), you have no indication this is a valid prefix of an email address.
Secondly, I don't see how bob is a high value target. Someone who knows what spam is, and knows what a spam bot is, and knows they want to obfuscate it, is probably not someone who would make a purchase if they did receive spam.
I think that's one reason spam bot crawlers don't try that hard to obfuscate addresses: the recipients are of less value than those unobfuscated.
Yes. I've never bothered to obfuscate my email for exactly that reason, rather than for reasons of user convenience. If I were writing a program to crawl web pages for email addresses it would be extremely easy to include most of the common obfuscation patterns. If you really want to obfuscate then use some kind of image of the text, which is less easily machine readable.
Add a plus to your name whenever you put your email address into a form on a website. It will continue to show up as though the +whatever were not present.
I try to use that all the time, except that everybody likes to roll out their own email validation, and they are all wrong, they do not allow the + sign as they should.
The RFC is extremely permissive, even spaces (yeah, spaces) are allowed in email addresses.
I tried doing something like that, but it ended up being a hassle using my multiple email address versions to log in to sites. Actually, it would be nice to have a browser smart enough to do something like this automatically. I'll bet there's a FF plugin for that... but I use Chrome.
My spam avoidance scheme is to own my own domain, and then have *@mydomain.com go to my inbox, so i can create disposable addresses for any use at any time.
For instance, I always sign up with [servicename].account@mydomain.com and only ever give out my personal e-mail to people i meet in person.
The great thing is that if a [servicename].account address starts getting spam, i know which service sold my address and i can just blackhole that address.
That way, i never have to obfuscate my address, since i'll always just create a new one for the specific need. It's probably not for everyone though...
I just use Spamassassin with my domain name and I get about 2 spam messages in my inbox per day, but the spam box gets around 1000 per day. I post my email address on websites because I want to have zero barriers when a customer or lead needs to contact me. It's NOT THEIR PROBLEM that I get spam.
Unless you are really desperate for people to contact you, forcing them to "decode" the email address might be a nice velvet rope that keeps out those who aren't worthy of your time.
Obviously this might not work if you have a customer support email.
No, spam filters are not pretty good. Instead, email has now become an unreliable channel for me. I get so much spam that I have to enable automatic filtering. That means I probably miss non-spam emails occasionally. And some spam still slips through the automatic filter.
I couldn't agree more with the post. I've had my primary email address sitting out on my contact page for years and have never spent a significant amount of time dealing with spam. At this point, obfuscated addresses are as archaic as animated "under construction" GIFs and the blink tag.
Using the de-obfuscation process as a gating mechanism is arbitrary and perhaps a bit arrogant. There's no reason to assume that familiarity with this convention equates to intelligence or value. Even now, I still run into plenty of people in businesses outside of the tech industry that confuse website and email address formats. That doesn't make them "unworthy" of contacting us; it just means they have a different skillset and knowledge than we do.
Meanwhile, email spammers and scammers are most likely to understand these conventions, since it's their "job" to do so.
One possible solution to the problem is to use dynamically generated throw-away email-addresses. You can also encode some kind of signature. Then just hide the monstrously huge address behind a pretty "Please Email me" Link:
If Spam increases (or, say, whenever more than 3 emails have been received at one particular address), you shut down the address.
I once implemented the receiving, hmac/signature checking, part for the exim mail-server and a general address-generator class in python and php. But never actually put it to use.
I wouldn't obfuscate the support or billing contact address for my company, but on my personal site? Or code documentation posted on the web? You betcha.
There's a big difference between pushing work on your customers and pushing work on people who don't know you and are trying to contact you the first time.
The two main suggestions to allow people to contact you without getting lots of spam:
1) Use Gmail. Google has written their search engine to know how to detect spam so they know how to stop spam from reaching your email address. I don't think I have ever had even a single spam message even reach my spam folder, and that's thanks to Google. I could put my non-obfuscated email address all over the place and not have to worry.
2) Write a custom contact form. It is not that hard if you know even a little bit of PHP. And if you don't you can always use Zoho forms or some other free online form creator. I never put up contact emails because they are, in my opinion, just as unprofessional as an obfuscated contact email. Contact forms are much more professional.
Looks like we miss some important question here.
How spammer got the emails ? Do the Spam bots really crawled to every web pages ?
Some note that i learn from Internet to minimize spam emails:
- Never use third party proxy or anonymous network (i.e. Tor). I once work for company that not allowed to use any port except 8080, so for several months i use Tor. Suddenly, after several weeks my spam folder increased with junk emails.
- Make sure you clear all your caches and cookies _before_ and after browsing for pr0n. duh! :)
- Never use any third party application from Facebook/MySpace/any-social-networks, unless you using your non-private mail on your Facebook/MySpace/any-social-networks account.
- Do not read spam email. If you know that email is spam just check it and delete, or let the system delete it automatically, like Gmail do. I do not know anything about SMTP protocol but there is one feature that make your email notified to sender when you read it, by opening your email you just notified the spammer that your email is, at least still, active.
Spammer, in context of the emails gatherer, is not stupid. They know what their doing.