« Foo * | Main | Roboblogger's busy profile »

htbg, notes

no this isn't part II yet, just some random thoughts I had this morning.

i'm on vacation this week so no polish, sorry. :-|

13. Both personalization and natural language approaches to search seem to mainly be about disambiguation. I've written a big disambiguation engine, one of the better commercial ones on the net. Disambiguation doesn't seem as interesting to me anymore.

Grouping terms and ranking compounds is more useful, IMO. Hence Ask having unfortunate results for stuff like lady diana car accident. Is this Edison yet?

Full blown question answering, apart from being something that nobody actually wants, is a matter of first structuring the web, and then converting english into some SQL-like stuff to run against it. If you could structure the web though you could skip the SQL business because you'd already have 98% of the win.

Sentence tagging doesn't seem that interesting. Parts of speech are this chompskian red herring where a set of artificial categories have been imposed on english. So you have an n% error tagger mapping these basically useless categories to web text. If you actually were able to put together some kind of probabilistic parse map, you could predict completions like "I played fetch with my <x>". Classic taggers don't do much for typical queries either.

Check out the great Ask patent screenshots from seo by the sea. So which rule(s) do they violate? We don't always really want what we think we want. Or maybe I'm wrong, it is a cool looking mock. :)

About

This page contains a single entry from the blog posted on April 13, 2007 11:56 AM.

The previous post in this blog was Foo *.

The next post in this blog is Roboblogger's busy profile.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.33