Search is an absolutely fascinating problem to work on for a bunch of reasons. For one thing you have to scale the thing before getting the first user. You can't just start with a server or two and add more when the users come. Step 1 is to copy the internet onto your cluster. Step 2 is to analyze it..
The componentry is remarkably deep.
Search is like 7 hard problems wrapped into a stack. Distributed systems, html analytics, text analytics/semantics, anti-spam, AI/ML, frontend/UI. And scale... Apart from the sexy high end algos there are also the boring 10-year old system libraries and off-the-shelf tools that crack under stress and sometimes need a look. You open the hood and wonder how the thing ever worked in the first place...
Plus there is always something fresh and new every day mining through the vast sordidness of the many billions of pages on the web. You expect to be amazed at the endless varieties of crazy porn domains and new approaches to webspam. But there are equal horrors in the small, finding pathological charset issues, previously-undiscovered abominable server implementations, psychopathic website owners. The web is a reactive fuzz test.
I know there are some great coders out there reading this blog who would have blast working on some of the pieces here that need to get built. This is a great opportunity to join an experienced team early building a big system from the ground up. If you think you might be interested, send me an email and we can chat.
fyi our interviews always have coding tests. Primarily we are looking for folks who love to write code and are good at it. :)
Comments (34)
" You can't just start with a server or two and add more when the users come."
Dude... just run Ruby on Rails. Problem solved!
Posted by Kevin Burton | May 3, 2008 4:35 PM
Posted on May 3, 2008 16:35
" You can't just start with a server or two and add more when the users come."
Dude... just run Ruby on Rails. Problem solved!
Posted by Kevin Burton | May 3, 2008 4:40 PM
Posted on May 3, 2008 16:40
Note recent article on Slashdot...
"According to TechCrunch, Twitter has plans to abandon Ruby on Rails after two years of scalability issues."
Posted by SHumphreys | May 3, 2008 9:39 PM
Posted on May 3, 2008 21:39
Well, the 8th and most important problem of search engines is being special, and giving more than Google. Good luck with your development, but as I have a blog search engine, I know it's a hard question.
Posted by András Bártházi | May 5, 2008 8:49 AM
Posted on May 5, 2008 08:49
Good Luck Rich!
Posted by Steve Iams | May 9, 2008 1:53 AM
Posted on May 9, 2008 01:53
Sounds like a fascinating project :)
Wish I could help >.> If I wasn't such a noob I would probably be all over this.
Best of luck to you :)
Posted by Chris Hooles | June 19, 2008 4:09 AM
Posted on June 19, 2008 04:09
Interesting. I came up with an algorithm a while back that does not rely on inbound links whatsoever as a factor. Instead it looks for advanced and simple seo black hat items and penalizes them, then takes into account variables to determine which website is the most relevant for a given search word/phrase.
While it uses an advanced point system, the actual points negative or positive paired with the content the points are given-to/subtracted-from are the secret behind a successful relevant search engine.
I am curious to see how you can implement something similar to overtake Google ;)
Posted by Web Design Taxi | June 29, 2008 4:06 PM
Posted on June 29, 2008 16:06
Someone said earlier that its important to give more than google.
I am in disagree. The trick is to offer less than google !
its all in the AI.
Best
Jose
Posted by Jose | August 1, 2008 7:10 AM
Posted on August 1, 2008 07:10
I have found your project by a crawler on my website and i was interested where come that crawler from.
sorrys for my "bad" english
Good luck guys !
Greets from Germany
Marc
P.S. sorrys for my "bad" english
Posted by Marc Leipnitz | September 15, 2008 11:09 PM
Posted on September 15, 2008 23:09
hey this looks good, seem you guys in my crawler stats, you may index my site and all links from it.
keep up the good work!!!
holly
Posted by holly | October 31, 2008 4:47 AM
Posted on October 31, 2008 04:47
Isn't it kinda hard to just start a search-engine. I mean, you might make one, but how will you bet people to go onto it, rather than google?
Good luck...you'll need it
Posted by Roy | November 11, 2008 10:24 AM
Posted on November 11, 2008 10:24
Holly,
Google did it a number of years ago. It just takes one heck of a marketing plan and the cpital to push it properly.
Good luck Rich
Posted by WilliamC | November 24, 2008 1:07 PM
Posted on November 24, 2008 13:07
discovered your bot via my log file
and seeing the ip as coming from psi/cogenco
(I have always blocked psi ip's due to bad bots
and bad activity)....
I also read the tidbit of your history that you
wrote (one of the first) viruses it is something
that I would not be proud of and I would not want
a bot hitting my site knowing that aspect....
I will be blocking your bot(s) and the ip range
from psi (robots.txt file is usually rather
useless and is often ignored so I use .htaccess
which is much more effective.
Doubtful that you could create a SE better than
the existing major ones, MSN, Yahoo, and Google
(the latter is poor since they started with page
rank that yanked out small sites and they have
bad policies eg can sandbox without telling anyone
and without valid reasons).
Posted by george | February 20, 2009 4:34 AM
Posted on February 20, 2009 04:34
Good Luck with the David and Goliath Thing.
PS, Can you put my website on the #1 Spot when you launch :-)
Posted by Sean | February 20, 2009 6:35 AM
Posted on February 20, 2009 06:35
Your crawler came knocking so i wanated to see what's up. Too bad no free samples!!!
LOL
Posted by badmatty | February 21, 2009 4:56 AM
Posted on February 21, 2009 04:56
Is it possible to open some parts of the project as opensource? I would love to start helping with very simple parts like discussions and creating mobile interfaces, for example.
Posted by Rafael Sanches | February 23, 2009 2:02 PM
Posted on February 23, 2009 14:02
Not to be a Debbie Downer, but if everyone's a coder, who's running the asylum?
Posted by Rotkapchen | March 19, 2009 8:38 PM
Posted on March 19, 2009 20:38
We have 10 engineers and 1 vp-of-everything-else. :)
Posted by Rich Skrenta | March 19, 2009 8:42 PM
Posted on March 19, 2009 20:42
Hi,
Want any graphics or UI stuff doing?
If so, drop me some specs and I'll send you some stuff. Glossy, aqua, grungy etc. all fine. Happy to do something different too.
DS_UK
Posted by Dogsolitude_uk | March 24, 2009 4:58 AM
Posted on March 24, 2009 04:58
I found this after noticing the crawler on my site. However, my site has been up and running since December and it appears this is the first time. Google hits my site several times a day. I'm not sure how they can do this and hit the massive amount of sites. I can't imagine what it's going to take to beat googles infrastructure.
Posted by ray malone | April 24, 2009 6:09 AM
Posted on April 24, 2009 06:09
I'm interested to work at blekko.
Here is my resume. I'm uniquely intelligent and creative.
Reed S. Kotler
1030 East El Camino Real, Suite 278
Sunnyvale, CA 94087
main: (408) 836–3774 alternate: (408)730–9557
website: http://www.reedkotler.com
http://www.toriirecords.com
email: reedkotler@hotmail.com
Objective
Find a job with challenging problems requiring a uniquely creative, persistent and intelligent individual.
Programming and Systems Expertise
Able to quickly develop programming solutions to solve client problems using:
C , C++ and C#, Objective C, Java, Lisp, FORTRAN, Ada, Pascal, Modula 2, Matlab
ADO.NET, ASP.NET, MFC
Java, JavaScript, HTML, Perl, PHP, Python, MYSql, AJAX
Windows, Unix/Linux, Mac OS, WinCE , Window Mobile
Microsoft SQL
Assembly Languages (80x86 family, 68k family, PPC family, MIPS family, ARM Family, Space Shuttle AP101)
Windows GUI, Mac and Iphone GUI/Cocoa
Protools (110 certification), Reason, Final Cut Express, Avid
Adobe CS 4
Experience
Reed Kotler Systems, Inc. (President and co founder) 1997 – present
Created commercial audio/music analysis hardware devices and software-only versions, (Windows and Mac) available from www.reedkotler.com
Developed Unix-like development toolset for Windows 95/NT (port of GNU tools).
Consulting for WebTV/Microsoft, MIPS, Palm, IBM, Lockheed, Sun Microsystems, and others.
GCC/GDB/GNU rehosting and modification.
Independent Consultant to various clients, 1986 – present
Maintained Microsoft Platform Builder debugger for WIN CE for Microsoft
Developed TV Set top box and mobile software for various Microsoft Products
Developed Crashlog software for several Microsoft TV Products which included an extensive .NET server application.
Developed Server application for Microsoft for Billing and Subscription for a Microsoft TV Product.
Development on ISI Searchlight Debugger. Ported to ARM, PPC, MIPS, 68K, others.
Ported Suns Debugger from Solaris to HP/UX.
Ported MetaWare C/C++ compiler to Windows NT/98. Worked on PPC elf linker.
Design of sophisticated AI software for Lockheed Satellite program
Development on Intelligence System for ESL.
Designed a large and complex relational database application written in Ada to support a sophisticated satellite ground station. Performed the database structure and requirements analysis as well as software design and coding.
Development on intelligence system written in Ada. Designed and implemented database services for the application making complex use of UNIX system services for shared memory, semaphores, and TCP.
Consulting related to the Ada programming language, overall software systems design and prototyping, DBMS application design, and software systems requirements analysis.
Troubleshot Ada code in weather-forecasting satellite ground station resolving various concurrency issues and other problems.
President and founder of Reed Kotler Music Inc, 1998 – present
Development of hardware and software music products.
Produced TR-1000 and TR-400 Digital music study recorders
Produced LBR-100 lead/bass isolator
Produced Midi Brick synthesizer
Managed employees, did marketing and sales, trade shows, supervised manufacturing, parts procurement. Products had software, electrical and mechanical design components.
President and founder of Torii Records, Inc. 2001 – present
Production and marketing of Jazz music.
Produced 8 Cds
Three Cds were in the top 10, one at #2, one at #7, one at #9, one at #11 and one at #19 in the USA on Jazz radio and one was nominated for a Grammy in 2005. All Cds have been played extensively on jazz radio the in USA, on Satellite Radio, on Cable TV Radio, and in Europe and Canada.
Composer/Musician – present
Internationally known composer. Most recent Cd was #9 in the USA on jazz radio.
Faculty member of Stanford Jazz Summer Residency Program for over 10 years. Teaching composition, transcribing, theory, harmony, improvisation theory.
Plays Piano, Guitar, Saxophone, Bass, Drums, Latin Percussion.
Staff transcriber for many years for Jazz Improv Magazine.
General Transformation Corporation, Berkeley, CA (VP Engineering) 1984-1986
Design and implementation of Ada language compiler for the IBM PC under DOS.
Design and implementation of full LR(1) parser generator and other development tools including a file comparison program (written up in Jerry Pournelle’s BYTE magazine column.)
Lockheed Missiles and Space Co., Sunnyvale, CA (Scientific Programming Analyst) 1981 – 1984
Design and implementation of INGRES-like relational database management system written in Ada.
Participated in (and received commendations for) review work on the Ada language and reference manual.
Chief designer and technical supervisor for generic Communications, Command, Control and Intelligence (C3I) system written in Ada.
Strategic Information Burlington, MA (Computer Scientist) 1980-1981
Development on design and implementation of proprietary language for econometric forecasting.
Intermetrics Inc. Cambridge, MA (Sr. Sys. Analyst/ Programmer) 1975 - 1977
Support of company’s Space Shuttle program, including development of tools for software configuration, management, and quality assurance.
Maintenance of mathematics and character libraries for HAL/S compilers for primary onboard space shuttle computers.
Maintenance and rehosting of XPL language compilers.
Received award from NASA for a technical innovation.
Foreign Languages
TRKI 2 level Russian.
Education
Antioch College (Yellow Springs, Ohio), BS in Mathematics, 1977.
Posted by reed kotler | May 1, 2009 9:19 PM
Posted on May 1, 2009 21:19
When you mentioned the internet-as-fuzztest I thought of something that blew my mind.
Imagine someone attacking your app (which is attempting to validate, index, and carefully analyze the structure of input) every day with very ingenious and carefully crafted bogus input. Now imagine someone doing it with gigabytes of bogus input. Now imagine doing it with hundreds of gigabytes. Now imagine another 200k people doing it, and lots of them are smarter than you, way way smarter, and know way more. Now imagine your app can't crash or serve dangerous content, ever, or you're screwed. And it can't serve innocuous but bogus content, or you fail.
Good luck!
Posted by Justin Van Winkle | May 3, 2009 6:08 AM
Posted on May 3, 2009 06:08
Hey guys,
You guys are doing great, you have the knowledge and experience to do this, don't let anyone stop you or let you down. Make sure you also focus on a good revenue plan, a good source of income for the company and all will be great!
Wish you guys the best, and everytime you feel that you are under pressure of the larger corporations, just kick some ass!
Posted by shocky | May 29, 2009 11:37 PM
Posted on May 29, 2009 23:37
Do you need an experienced Quality Assuance Tester? I would love the opportunity to work on this project. Please feel free to contact me!
Posted by Rick Blodgett | June 14, 2009 9:02 PM
Posted on June 14, 2009 21:02
And...who's your management team and do they need any help at that upper level? Let me know, am looking for interesting ways to make Yahoo & MSN quiver in their boots. If someone displaces them and starts to put the clamps on Google...life will be interesting.
Michael Murdock, CEO
DocMurdock.com
Posted by Michael Murdock | July 27, 2009 12:39 PM
Posted on July 27, 2009 12:39
Good luck! Hopefully you guys will get at least 1% of the market share! If you need a designer to do some work, even for free (for now) lemme know. =)
Posted by Mark | July 27, 2009 6:02 PM
Posted on July 27, 2009 18:02
Congrats on your recent funding success! Hope all goes well!
Posted by Brandon Justice | July 28, 2009 6:02 AM
Posted on July 28, 2009 06:02
Hey, this is a great idea. but can you tell me / us, when the beta is starting? i'd like to get my hands on it.
c-ya
n
Posted by Nauck IT Consulting | July 28, 2009 7:32 AM
Posted on July 28, 2009 07:32
With lots of Web 2.0 sites yet to be properly crawled and indexed by older engines, technical there is a gap for improvement - and business.
I think u guyz can pull it off and build a valuable product with some web analytics and collective intel techniques.
Posted by Eric Kotonya | July 28, 2009 9:39 AM
Posted on July 28, 2009 09:39
Search works when you know exactly what you want and how the world refers to it. Otherwise, your SOL!
When presented to the user, search is exceedingly simple. Do the search-results fit that user's contextual needs at that moment? Or, like Google, Bing, Wisenut, MSN, Yahoo, and all the other "search engines" did you pepper them with offers for "Nikes" when they were there researching arch designs for flat-footed autistic kids? If they searched on "that info" you and I know they'd have seen everything offered from soup-to-nuts, so even search-specificity has been programmed out of the user's search toolkit by the irrelvance inspired Keywords industry!
The long-tail exists mostly because nobody wants that stuff. Not the first time, and certainly not offers 2 through 7,954 from Overstock.com and Smartbuys.com.
Nope, if you can contextualize the user's visit and consider "now" to be more than a mere regurgitation of "then", perhaps you can then use your signal processing skills applied to streams of "cogently-classified" data points presented as a sequence of internet transactions to predict what that user is looking for right now!
Culturally cognet content-classification is the only workable answer, and using semitotics and signal processing is the way I do it now instead of pretending to do it before with vector-cosines and high-performance computing scientists making a mess of every analytic they need but can't quite figure out why!
TV
Posted by tim vogel | July 28, 2009 11:24 AM
Posted on July 28, 2009 11:24
So, now that Yahoo! has finally given in to Microsoft.. what does this mean for Blekko? Still moving forward? Any updates?
By the way, I do like the name. Maybe change it to something that sounds powerful like that, but that doesnt sound dumb when using it instead of "google". You cant really "bing" things. "Blekkoing" sounds off too (and I know this is only a stealth name), but you might be on to something..
Posted by socialnerdia | July 29, 2009 1:11 PM
Posted on July 29, 2009 13:11
Hello, I am a college student VERY interested in things like distributed systems and natural language processing. I'd like to help work on this project any way I can, even for free, or at least open some channels of communication. I can script Java, Lisp, and C++.
Posted by Nathan | September 1, 2009 9:27 PM
Posted on September 1, 2009 21:27
I was reading this blog, and am very fond of the idea of new search to replace/ challange Google.
I own the domain name WhySearch.com and would be interested in making it part of this endeavor, if this is still on track.
Eli
Posted by eli | October 5, 2009 2:15 AM
Posted on October 5, 2009 02:15
Just checked out the Beta, wow when you look back at this post and see all the work that came together you guys have done a great job. I wasn't expecting much but may actually start using Blekko as my default search.
Awesome work guys, the SEO features are great.
Posted by Hornswaggled | July 29, 2010 4:34 PM
Posted on July 29, 2010 16:34