If YooName Doesn’t Know It, It Doesn’t Exist

A demo log entry attracted our attention:

“I talked with Frfrfrf yesterday.”

This made-up sentence was clearly sent by a researcher in the field. This is one way to test an important feature of a Named Entity Recognition (NER) system, and to determine the paradigm behind it.

NER systems are either based on “lists&rules” or on sequence labelling algorithms. Take this sentence, for example:

“I talked with France yesterday.”

The first system requires massive lists of entities along with contextual rules to resolve ambiguity. Using lists&rules, “France” is matched to entries lists for countries and first names. A disambiguation rule will assign the right type, using contextual cues such as “talked.” This strategy works well with “France” but doesn’t handle “Frfrfrf,” as it is not in the vocabulary.

The second system relies on learned probabilities of word sequences and their inner features. Using sequence labelling, the word “France” is assigned a probability of a given type using features such as capitalization in lists of entities, prefix, suffix, and all contextual word features. List lookup is only optional, though it is recognized as a very good feature. This strategy works well with “France.” It can also handle unknown words such as “Frfrfrf” if the contextual cues are strong enough. This is exactly what happens with the LingPipe system, which annotates “Frfrfrf” as a person.

YooName is based on finite lists and rules. Therefore, if YooName doesn’t know it, it doesn’t exist. YooName constantly scouts the Web searching new entities. So far, it never found “Frfrfrf.” We believe that if it runs long enough, it may end up knowing every single named entity out there.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: