Unfortunately the dontdelete tool doesn't seem to work anymore. Smells like someone loaded it into mysql and the database isn't running anymore or something. So I hacked up a quick little replacement for the purpose I was using it it for, browsing a random user session. I based this on my joke code, so it would be fast. It's fast! 2-3ms to return a random session from the 577,663 available.
Why is this useful... When you get past the voyeuristic fun, I've found that it's actually really hard to think up representative random searches to try out search engines to see how they do. I've never been very good at this; someone sends me to a new search engine, and I type 'skrenta', and then I go blank. Mike typed 'britney spears' when I showed him AskX. The problem is that 'britney spears' has been hand-optimized at Yahoo, Google, MSN and ASK, because there are guys just like us working at all of those companies. It's supposedly a popular query category, it's obviously monetizable, and it's easy to license the AMG or Muze data and make them better. But I have this nagging suspicion that 'skrenta' and 'britney spears' aren't serving me very well to take effective soundings of a new engine's quality.
Hence my random search tool. Real users type such gonzo stuff into the search box. You can't make this stuff up, which is the point. I included fresh-window links to a basket of other SE's, so you can see how the query does on different engines.
My all-time favorite so far: [will anastasia hurt my pregnancy]
Easy for a human to correct! You know what she means ("anesthesia", i.e. what are the risks of pain meds during pregnancy, getting an epidural, etc.) But no search engine can do that phonetic correction yet based on the greater context of the sentence. Maybe Powerset is working on stuff like this.
Give it a try here: