Search this site

Match case Regex search

Matching entries from Skrentablog

Web robot names considered, and rejected

Google's is "Googlebot" Yahoo's is "Slurp" Cuill's is "Twiceler" It makes sense have a friendly robot user agent, so nervous webmasters won't ban it. You don't want to call your crawler 'sitejacker' or something.. Unfortunately my favorite candidates were: Crawlhammer...

Cuill is banned on 10,000 sites

Be careful while you debug your crawler... Webmasters these days get very touchy about letting new spiders walk all over their sites. There are so many scraper bots, email harvesters, exploit probers, students running Nutch on gigabit university pipes, and...

Who will stop Google from going to 90% market share?

Jason predicts Google going to 90% market share.. He makes a solid argument and covers the bases. Referred traffic today suggests Google is at about 85%. Ask just quit the game, msn/yahoo put themselves into a tarpit. So the field...

Ranking Web 2.0 sites by server latency

Server latency is the start of the battle for site performance. There are great tutorials on how to optimise your html, but if your server takes too long sending the bytes out in the first place, there's nothing the browser...

The 11 startups actually crawling the web

The story goes that, one day back on the 1940's, a group of atomic scientists, including the famous Enrico Fermi, were sitting around talking, when the subject turned to extraterrestrial life. Fermi is supposed to have then asked, "So?...

Feed Subscription

If you use an RSS reader, you can subscribe to a feed of all future entries matching 'cuil'. [What is this?]

Subscribe to feed Subscribe to feed