Scientific Contributions that Shaped NER

The Named Entity Recognition (NER) field was born over fifteen years ago. It is often suggested that Lisa F. Rau’s paper, “Extracting Company Names from Text” (1991), is the root of all NER work. The Message Understanding Conferences (MUC) are also noteworthy for coining a lot of terminology (the expression “named entity recognition ” itself; entity classes such as enamex, numex, and timex; etc.) For this reason, all participants of the MUC deserve credit for a lot of structuring ideas. Even so, here’s a list of five scientific contributions that simply stand out from the crowd.

1993: McDonald’s internal and external evidences.

McDonald’s original paper (a version published in 1996 is also available). He argues the necessity of using named entity (NE) internal evidences (the name’s structure), as well as external evidences (the textual context surrounding the name). He also introduces the first NER paradigm, which is still in use today: Delimit, Classify, Record.


1997: Bikel et al.’s HMM system.

Nymble is widely cited as the prototypal HMM-based system (a second NER paradigm). With a good set of features and a sound HMM training and decoding, Nymble’s performance rivals human annotator’s precision on a specific corpus. Nymble is the foundation of the commercial BBN Identifinder system.


1997: Palmer and Day’s statistical profile of the task.

Palmer and Day’s paper is a valuable resource for any background work in NER. It addresses the crucial properties of names in text: the mean length of NEs; the relative proportion of basic types (enamex); the vocabulary transfer in typical annotated corpus; and the baseline strategy for NER. The work has been done on six languages.


1999: Cucerzan and Yarowsky’s bootstrapping.

There is a paradigm shift towards unsupervised and semi-supervised techniques in the NER field, and in the machine-learning field in general. This work pioneered the idea by showing how a very small seed of exemplar NEs can be bootstrapped in large and precise NE lists, paired with contextual evidences.


2002: Sekine et al.’s type hierarchy.

Gone are the days when the world was divided into three NE types (person, location, organization). The real world is divided into hundreds of types, all with primordial fine-grained distinctions. Sekine et al.’s hierarchy was designed to reflect well-known thesauri divisions of proper names, as well as the current scope of NER systems and common NE types found in newspapers.

[Citeseer] [NYU]


