Did Powerset outsource their crawl?
ec2-67-202-8-249.compute-1.amazonaws.com - - [28/Mar/2008:23:31:06 -0700] "GET /2006/12/scale_limits_design.html HTTP/1.0" 200 11526 "http://www.skrenta.com/2006/12/i_took_a_ukulele_lesson_once.html" "zermelo Mozilla/5.0 compatible; heritrix/1.12.1 (+http://www.powerset.com) [email:crawl@powerset.com,email:paul@page-store.com]"
They're using the open-source Heritrix crawler, running out of Amazon Web Services. But who is page-store.com? From their site:
Vertical search sites are relatively costly to operate. A single vertical search engine may need to sweep all or a large part of the web selecting the pages pertinent to a small set of topics. Startup and operating costs are proportional to the input page set size, but revenue may be only proportional to the size of the selected subset.Page-store positions itself as a web wholesaler, supplying page and link information to vertical search engine companies on a per-use basis. The effect is to level the playing field between vertical search and general horizontal internet search.
Page-store can provide
- selected page feeds based on deep web crawls
- page metadata
- black-box filters
- anchor text results
- link information
Did Powerset outsource their crawl?
This happened to KFC with the colonel...he started out as realistic line drawing of Colonel Sanders with the company name - "Kentucky Fried Chicken." After the waves of rebranding stylists were done with him he was an abstract cartoon. They couldn't stop there and abbreviated the company name. You're wouldn't want to realize you're eating FRIED CHICKEN when you're at KFC after all. You probably want to be eating a healthy salad with dressing on the side.