Search is an absolutely fascinating problem to work on for a bunch of reasons. For one thing you have to scale the thing before getting the first user. You can't just start with a server or two and add more when the users come. Step 1 is to copy the internet onto your cluster. Step 2 is to analyze it..
The componentry is remarkably deep.
Search is like 7 hard problems wrapped into a stack. Distributed systems, html analytics, text analytics/semantics, anti-spam, AI/ML, frontend/UI. And scale... Apart from the sexy high end algos there are also the boring 10-year old system libraries and off-the-shelf tools that crack under stress and sometimes need a look. You open the hood and wonder how the thing ever worked in the first place...
Plus there is always something fresh and new every day mining through the vast sordidness of the many billions of pages on the web. You expect to be amazed at the endless varieties of crazy porn domains and new approaches to webspam. But there are equal horrors in the small, finding pathological charset issues, previously-undiscovered abominable server implementations, psychopathic website owners. The web is a reactive fuzz test.
I know there are some great coders out there reading this blog who would have blast working on some of the pieces here that need to get built. This is a great opportunity to join an experienced team early building a big system from the ground up. If you think you might be interested, send me an email and we can chat.
fyi our interviews always have coding tests. Primarily we are looking for folks who love to write code and are good at it. :)