Archive for November, 2007

If YooName Doesn’t Know It, It Doesn’t Exist

November 20, 2007

A demo log entry attracted our attention:

“I talked with Frfrfrf yesterday.”

This made-up sentence was clearly sent by a researcher in the field. This is one way to test an important feature of a Named Entity Recognition (NER) system, and to determine the paradigm behind it.

NER systems are either based on “lists&rules” or on sequence labelling algorithms. Take this sentence, for example:

“I talked with France yesterday.”

The first system requires massive lists of entities along with contextual rules to resolve ambiguity. Using lists&rules, “France” is matched to entries lists for countries and first names. A disambiguation rule will assign the right type, using contextual cues such as “talked.” This strategy works well with “France” but doesn’t handle “Frfrfrf,” as it is not in the vocabulary.

The second system relies on learned probabilities of word sequences and their inner features. Using sequence labelling, the word “France” is assigned a probability of a given type using features such as capitalization in lists of entities, prefix, suffix, and all contextual word features. List lookup is only optional, though it is recognized as a very good feature. This strategy works well with “France.” It can also handle unknown words such as “Frfrfrf” if the contextual cues are strong enough. This is exactly what happens with the LingPipe system, which annotates “Frfrfrf” as a person.

YooName is based on finite lists and rules. Therefore, if YooName doesn’t know it, it doesn’t exist. YooName constantly scouts the Web searching new entities. So far, it never found “Frfrfrf.” We believe that if it runs long enough, it may end up knowing every single named entity out there.

The Same Antique Web

November 12, 2007

It’s in the air.

We’re flooded by catchy phrases announcing it.

It’s all about semantics, AI and Web 3.0:

Web3 is closer than you think!”
You ain’t seen nothing yet!”
Web as artificial intelligence supplanting human race!”

Some years ago, “you” were the superstar of Web 2.0 and its social networks.

In the late ’90s, the dot-com boom had everything going Web-based, from grocery delivery to movie rentals. It was also when Google made its debut.

Overall, it’s difficult to say if these Web movements were successful or if the whole thing was a waste of time and money. Judging from Web 1.0 Google and Web 2.0 Wikipedia, I’d say humanity has a positive balance.

Web 3.0 should be just as disruptive as its predecessors. There will be tons of new companies bridging natural language technologies and current Web content to provide what we could call “semantic hyperlinks.” If just one of these companies can find a way to resolve ambiguity, which accounts for 50% of everything we write, it will totally change the face of the Web. Right now, we must realize that all that noise and irrelevance in search engine hit lists is abnormal.

In the end, “versioning” is just part of the Web hype. What really matters is what’s always been fundamental about the Web:

People – a lot of people – sharing content.