« Cluster map propagation in Amazon Dynamo | Main | Microsoft "hits back" at Google with re-launch of 4-year old Newsbot »

Web robot names considered, and rejected

Google's is "Googlebot"
Yahoo's is "Slurp"
Cuill's is "Twiceler"

It makes sense have a friendly robot user agent, so nervous webmasters won't ban it. You don't want to call your crawler 'sitejacker' or something.. Unfortunately my favorite candidates were:

Crawlhammer
Webraker
Lurchy
Client9

hmmm. :-|

"Oh no! It's CrawlHammer!!"

If even in your heart you hide the urls ... there it shall rake for them...

...

Does anyone know what the purpose of a '+' in front of an url in the robots user-agent is? Some sites put in the '+', others don't...

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Mozilla/5.0 (compatible; Ask Jeeves/Teoma; +http://about.ask.com/en/docs/about/webmasters.shtml)

Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)

Gigabot/3.0 (http://www.gigablast.com/spider.html)

TrackBack

TrackBack URL for this entry:
http://www.skrenta.com/mt/mt-tb.cgi/239.

Comments (5)

Mark:

My favorite robot name was the old Interpix robot, "iSpi"

This crawled for images for the Image Search feature that they supplied to Yahoo in the middle late 1990s.

A bunch of Google old-timers came together today on an email thread to discuss the background on the '+'. I'll spare you the story and just let you know that you don't need to put a plus sign in the user-agent.

Thanks Matt! But I'd still love to hear the story... :)

I'd recommend something like Slimey, the worm that Oscar the Grouch watches over, but webmasters might be a bit leery of worms as well. :)

Let's dissect what the fears generally are:
1. It might go the way of Cuill and take down the damn webserver (we had to ban Cuill's IP range for doing this).
2. It might just be a scraper.

So, if you can get something that conveys the "I'll go slowly and not steal from you" message, win for you.

How about...
Safeslug
Snaildex
Charlotte (you know, from Charlotte's Web)

Here are some of my favorites from our logs:

DuckDuckBot/1.0 I'll play this with my kids this weekend.

focuseekbot, Do you pronounce that the F-U seek bot?

Following the + in the URL meme, how about ++ before https?
CityTwist/0.1;++https://

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on April 16, 2008 9:29 AM.

The previous post in this blog was Cluster map propagation in Amazon Dynamo.

The next post in this blog is Microsoft "hits back" at Google with re-launch of 4-year old Newsbot.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.33