Yahoo's is "Slurp"
Cuill's is "Twiceler"
It makes sense have a friendly robot user agent, so nervous webmasters won't ban it. You don't want to call your crawler 'sitejacker' or something.. Unfortunately my favorite candidates were:
Crawlhammer
Webraker
Lurchy
Client9
hmmm. :-|
"Oh no! It's CrawlHammer!!"
If even in your heart you hide the urls ... there it shall rake for them...
...
Does anyone know what the purpose of a '+' in front of an url in the robots user-agent is? Some sites put in the '+', others don't...
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)Mozilla/5.0 (compatible; Ask Jeeves/Teoma; +http://about.ask.com/en/docs/about/webmasters.shtml)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)
Gigabot/3.0 (http://www.gigablast.com/spider.html)
Comments (8)
My favorite robot name was the old Interpix robot, "iSpi"
This crawled for images for the Image Search feature that they supplied to Yahoo in the middle late 1990s.
Posted by Mark | April 16, 2008 7:33 PM
Posted on April 16, 2008 19:33
A bunch of Google old-timers came together today on an email thread to discuss the background on the '+'. I'll spare you the story and just let you know that you don't need to put a plus sign in the user-agent.
Posted by Matt Cutts | April 16, 2008 7:40 PM
Posted on April 16, 2008 19:40
Thanks Matt! But I'd still love to hear the story... :)
Posted by Rich Skrenta | April 16, 2008 11:04 PM
Posted on April 16, 2008 23:04
I'd recommend something like Slimey, the worm that Oscar the Grouch watches over, but webmasters might be a bit leery of worms as well. :)
Let's dissect what the fears generally are:
1. It might go the way of Cuill and take down the damn webserver (we had to ban Cuill's IP range for doing this).
2. It might just be a scraper.
So, if you can get something that conveys the "I'll go slowly and not steal from you" message, win for you.
How about...
Safeslug
Snaildex
Charlotte (you know, from Charlotte's Web)
Posted by Cygnus | April 17, 2008 7:26 AM
Posted on April 17, 2008 07:26
Here are some of my favorites from our logs:
DuckDuckBot/1.0 I'll play this with my kids this weekend.
focuseekbot, Do you pronounce that the F-U seek bot?
Following the + in the URL meme, how about ++ before https?
CityTwist/0.1;++https://
Posted by Jason Culverhouse | April 18, 2008 9:35 AM
Posted on April 18, 2008 09:35
I would use funny as my approach to gaining brand recognition. Call it the FART crawler. It will be on Yahoo! News tomorrow morning :))
Posted by Web Design Taxi | June 29, 2008 4:11 PM
Posted on June 29, 2008 16:11
Well it seems, the decision is made. Today I saw a visit from Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/) .
When I first saw it on my logs I was suspicious and thought "yet another content thief", but the name and the landing-page are indeed friendly enough, to let this crawler crawl. :-)
Posted by Andreas | September 16, 2008 3:48 AM
Posted on September 16, 2008 03:48
hahahaha I reached this blog because of the ScoutJet name on my logs. It's cool! Congrats :)
Posted by Rafael Sanches | February 23, 2009 2:18 PM
Posted on February 23, 2009 14:18