January 6, 2011

Introducing the Spam Clock


I consider myself a glass half full kind of guy, but it's hard to remain optimistic about the future of the World Wide Web. I think it's fantastic that my kids have access in real time to almost every piece of information and knowledge in the world. But ever since we started working on Blekko, I've become exposed to the dark side of the Internet.

Scratch below the surface of all this great information, or in our case dig deep below the surface, and it is shocking what is happening to the Internet. Millions upon millions of pages of junk are being unleashed on the web, a virtual torrent of pages designed solely to generate a few pennies in ad revenue for its creator. I fear that we are approaching a tipping point, where the volume of garbage soars beyond and overwhelms the valuable of what is on the web. Look at what has happened to email: Microsoft estimates that 90 percent of the mail that passes through its hotmail servers is spam.

What happened to email was the result of very powerful economics. Spammers and con artists discovered they could reach a massive audience for pennies. And this scale of audience essentially guaranteed a very small but profitable return. Today the economic incentives for web spammers are even more lucrative than email spam and almost guarantee a continued blizzard of trash on the web.

Web spammers simply have to create pages on the web and sit back and let search engines send them money. Current search engines have abandoned any attempt to enforce even the slightest modicum of quality control. Revenue is guaranteed if a page can draw a click.

The result is a global sweatshop workforce cranking out millions of pages of web trash. I fear we are looking at the very scary future of the web in the job postings at Mechanical Turk. Researchers recently reviewed job postings there and found that 41 percent of all jobs offerer over a two month period were aimed at recruiting workers to create spam. Most of these jobs offered folks a measly dollar a page. Some paid as little as 5 cents. But all these jobs are being filled and the spam gets spewed out.

("The most infamous girl in the history of the Internet")

Consider that in 2000 there were about 7 million hosts on the internet offering essentially all the content on the web. In 2010, the number of web hosts has soared to 250 million. How many of these 200 million plus hosts offer legitimate content? A small fraction. The rest is spam.

Which brings me to my larger point. This spam on the web is creating REAL problems that are affecting much more than our ability just to find information.

The energy and other costs for crawling, storing and serving this trash is soaring. I saw a recent estimate that 15% of the world's energy consumption in 10 years could go to support Internet usage. A fair amount of that energy is being burned by the thousands upon thousands of servers at incumbent search engines. Making search greener by weeding out spam could have a significant impact on energy consumption.

The problems and challenges of spam to the entire world are going to get worse. As the online economy continues to grow at double digits compared to stalled growth for the offline economy, the incentives for spammers get even more lucrative.

That's why we've created the world's first Spam Clock. This clock is going to record in real time the amount of web spam that is being spewed out. The clock is designed to bring greater attention to this growing problem. While it is illustrative more than scientifically accurate, it is truly indicative of the soaring spam problem.

Finally, what can we do about this? Honestly, we think our search engine can be an important solution but we need your help. If we can together create a search engine that is a curated resource of the best trusted sources on the web, we can do a great deal to reduce the economic incentive for creating spam. Spam operators won't even offer that nickel on Mechanical Turk if the chances are pretty good that a human editor will never include that page in the search database.

So we'd like to invite web searchers everywhere to help us clean up the web. It can be done. If we can just organize the best sources of information for the top 1000 search verticals we will drastically improve the web experience. And we will immediately create the first ever disincentive for polluting the web.

Please join us.

