Search is an absolutely fascinating problem to work on for a bunch of reasons. For one thing you have to scale the thing before getting the first user. You can't just start with a server or two and add more when the users come. Step 1 is to copy the internet onto your cluster. Step 2 is to analyze it..
The componentry is remarkably deep.
Search is like 7 hard problems wrapped into a stack. Distributed systems, html analytics, text analytics/semantics, anti-spam, AI/ML, frontend/UI. And scale... Apart from the sexy high end algos there are also the boring 10-year old system libraries and off-the-shelf tools that crack under stress and sometimes need a look. You open the hood and wonder how the thing ever worked in the first place...
Plus there is always something fresh and new every day mining through the vast sordidness of the many billions of pages on the web. You expect to be amazed at the endless varieties of crazy porn domains and new approaches to webspam. But there are equal horrors in the small, finding pathological charset issues, previously-undiscovered abominable server implementations, psychopathic website owners. The web is a reactive fuzz test.
I know there are some great coders out there reading this blog who would have blast working on some of the pieces here that need to get built. This is a great opportunity to join an experienced team early building a big system from the ground up. If you think you might be interested, send me an email and we can chat.
fyi our interviews always have coding tests. Primarily we are looking for folks who love to write code and are good at it. :)
Comments (19)
" You can't just start with a server or two and add more when the users come."
Dude... just run Ruby on Rails. Problem solved!
Posted by Kevin Burton | May 3, 2008 4:35 PM
Posted on May 3, 2008 16:35
" You can't just start with a server or two and add more when the users come."
Dude... just run Ruby on Rails. Problem solved!
Posted by Kevin Burton | May 3, 2008 4:40 PM
Posted on May 3, 2008 16:40
Note recent article on Slashdot...
"According to TechCrunch, Twitter has plans to abandon Ruby on Rails after two years of scalability issues."
Posted by SHumphreys | May 3, 2008 9:39 PM
Posted on May 3, 2008 21:39
Well, the 8th and most important problem of search engines is being special, and giving more than Google. Good luck with your development, but as I have a blog search engine, I know it's a hard question.
Posted by András Bártházi | May 5, 2008 8:49 AM
Posted on May 5, 2008 08:49
Good Luck Rich!
Posted by Steve Iams | May 9, 2008 1:53 AM
Posted on May 9, 2008 01:53
Sounds like a fascinating project :)
Wish I could help >.> If I wasn't such a noob I would probably be all over this.
Best of luck to you :)
Posted by Chris Hooles | June 19, 2008 4:09 AM
Posted on June 19, 2008 04:09
Interesting. I came up with an algorithm a while back that does not rely on inbound links whatsoever as a factor. Instead it looks for advanced and simple seo black hat items and penalizes them, then takes into account variables to determine which website is the most relevant for a given search word/phrase.
While it uses an advanced point system, the actual points negative or positive paired with the content the points are given-to/subtracted-from are the secret behind a successful relevant search engine.
I am curious to see how you can implement something similar to overtake Google ;)
Posted by Web Design Taxi | June 29, 2008 4:06 PM
Posted on June 29, 2008 16:06
Someone said earlier that its important to give more than google.
I am in disagree. The trick is to offer less than google !
its all in the AI.
Best
Jose
Posted by Jose | August 1, 2008 7:10 AM
Posted on August 1, 2008 07:10
I have found your project by a crawler on my website and i was interested where come that crawler from.
sorrys for my "bad" english
Good luck guys !
Greets from Germany
Marc
P.S. sorrys for my "bad" english
Posted by Marc Leipnitz | September 15, 2008 11:09 PM
Posted on September 15, 2008 23:09
hey this looks good, seem you guys in my crawler stats, you may index my site and all links from it.
keep up the good work!!!
holly
Posted by holly | October 31, 2008 4:47 AM
Posted on October 31, 2008 04:47
Isn't it kinda hard to just start a search-engine. I mean, you might make one, but how will you bet people to go onto it, rather than google?
Good luck...you'll need it
Posted by Roy | November 11, 2008 10:24 AM
Posted on November 11, 2008 10:24
Holly,
Google did it a number of years ago. It just takes one heck of a marketing plan and the cpital to push it properly.
Good luck Rich
Posted by WilliamC | November 24, 2008 1:07 PM
Posted on November 24, 2008 13:07
Good Luck with the David and Goliath Thing.
PS, Can you put my website on the #1 Spot when you launch :-)
Posted by Sean | February 20, 2009 6:35 AM
Posted on February 20, 2009 06:35
Your crawler came knocking so i wanated to see what's up. Too bad no free samples!!!
LOL
Posted by badmatty | February 21, 2009 4:56 AM
Posted on February 21, 2009 04:56
Is it possible to open some parts of the project as opensource? I would love to start helping with very simple parts like discussions and creating mobile interfaces, for example.
Posted by Rafael Sanches | February 23, 2009 2:02 PM
Posted on February 23, 2009 14:02
Not to be a Debbie Downer, but if everyone's a coder, who's running the asylum?
Posted by Rotkapchen | March 19, 2009 8:38 PM
Posted on March 19, 2009 20:38
We have 10 engineers and 1 vp-of-everything-else. :)
Posted by Rich Skrenta | March 19, 2009 8:42 PM
Posted on March 19, 2009 20:42
When you mentioned the internet-as-fuzztest I thought of something that blew my mind.
Imagine someone attacking your app (which is attempting to validate, index, and carefully analyze the structure of input) every day with very ingenious and carefully crafted bogus input. Now imagine someone doing it with gigabytes of bogus input. Now imagine doing it with hundreds of gigabytes. Now imagine another 200k people doing it, and lots of them are smarter than you, way way smarter, and know way more. Now imagine your app can't crash or serve dangerous content, ever, or you're screwed. And it can't serve innocuous but bogus content, or you fail.
Good luck!
Posted by Justin Van Winkle | May 3, 2009 6:08 AM
Posted on May 3, 2009 06:08
Hey guys,
You guys are doing great, you have the knowledge and experience to do this, don't let anyone stop you or let you down. Make sure you also focus on a good revenue plan, a good source of income for the company and all will be great!
Wish you guys the best, and everytime you feel that you are under pressure of the larger corporations, just kick some ass!
Posted by shocky | May 29, 2009 11:37 PM
Posted on May 29, 2009 23:37