Archive for October, 2007

Combining NEs with Social Networks

October 22, 2007

What happens when you combine Named Entity Recognition with Social Networks? Do they “blend”? We may have some insight when Twine reveals its platform. Until then, O’reilly Radar provides some information on the idea.



October 11, 2007

Google and Microsoft are both active in the Named Entity Recognition (NER) field, and more notably, in Named Entity Disambiguation. This task consists of “disambiguating between multiple named entities that can be denoted by the same proper name” (Bunescu and Pasca 2006). For instance, politicians, Internet entrepreneurs and criminals share the name of James Clark. And yes, these are all distinct entities.

Well-known NER researchers at Google and Microsoft published the following papers:

These are two very nice pieces of work that deserve an attentive read. What motivates this research is clear:

“A frequent case are queries about named entities, which constitute a significant fraction of popular Web queries according to search engine logs. When submitting queries such as John Williams or Python, search engine users could also be presented with a compilation of facts and specific attributes about those named entities, rather than a set of best-matching Web pages. “


The Need for a Prescriptive Ontology

October 9, 2007

A great deal of effort is invested in universities, research labs and companies to create prescriptive ontologies. Just think about large-scale project such as Cyc/OpenCyc or smaller projects build around OWL.

I use the term “prescriptive” to emphasize the fact that ontologies are usually defined in a hard-coded and formal manner. Let’s use the “Hotel” type, for example. Elements of this hypothetical ontology are capitalized and relations are in brackets:


| Hotel <is a> Building

| Hotel <is located in> City, State/Province, Country

| Hotel <is located near> Attractions

| Hotel <offers service> Parking, Pool, Gym, InternetAccess, etc.

| Hotel <has parts> HotelRoom

| HotelRoom <price> MoneyQuantity

| HotelRoom <rent> TimePeriod


The majority of semantic Web developers would agree this ontology is quite handy in the development of a hotel-related-semantic-web-2.0 application. But is it always handy? For how long? And is it really necessary?

Is it always handy?

Is this prototypal Hotel representative of all hotels? Clearly not. What about an ice hotel that melts (we would need a start and end date, as we do with an event)? What about cultural, local and special services (e.g., pet care, special shuttles, places of worship)? We can argue that there will always be a hotel with atypical characteristics.

For how long?

How long can these relations remain valid and when will new relations develop? Take the smart phone example. The first ontology for portable phones probably had no place for features such as “mail client, Internet browser, music player.” What about the recent trend of “boutique hotels?” Does our ontology represent it? What modifications must be made now and in the future?

Is it really necessary?

That’s the real question. Are prescriptive ontologies really necessary? What if we try to develop a semantic Web application without such an ontology? Could a descriptive/soft/bottom-up/empirical ontology be sufficient?

To return to the original scenario, let’s imagine an information extraction system that crawls the Web to try to fill the Hotel template for “City, Attractions and Parking.” Using prescriptive ontology, we can literally attach a pattern to these slots and hope it will work well. With of the help of good programmers, we can be sure the problem will be elegantly resolved, and with high accuracy. The advantage here is the predictability of a template-filling task. The disadvantage is the ontology’s incomplete nature and the maintenance it requires over time. Chances are that new features will simply go unnoticed, and this new ice hotel with sleigh-only parking will not be adequately represented in the current model.

In machine learning, the idea of a descriptive ontology and that of clustering are analogous. Instead of starting with a sharp definition of the world, we invest the time of our good programmers in identifying pages on hotels and cluster information in order to find typical patterns. Not surprisingly, the word “parking” would appear with common co-occurring words such as “not available,” “free,” and “$25 a day.” Moreover, other named entities such as city, museum and monument would also co-occur frequently. We can imagine quickly generating a template containing these frequent and distinctive elements. As time goes by, new features may become prominent, indicating that maintenance is required. The advantage of this empirical ontology is the boundary-free description of entities relations. The disadvantage is a higher noise potential and conceptual drifting that would require manual post-edition.

Recent interest and successes in unsupervised learning techniques suggest the second option, or a combination of both options, is viable and promising.



Bootstrapped learning beats AI

October 2, 2007

The EE Times has a story about a program called “Bootstrapped Learning,” developed by Darpa:

“The goal of Bootstrapped Learning (BL) is to develop an ‘electronic student’ that can be taught complex concepts incrementally over a very wide range of problem domains-without designing domain knowledge into algorithms.”

This kind of algorithm is strongly related to semi-supervised and active learning. Even when conventional AI outperforms them, these approaches boast one important benefit. When a system learns incrementally with little human knowledge, it essentially maintains itself. Or, it can at least be maintained or extended at a very low cost.

GeoNames’ Inside

October 1, 2007

Using the same modus operandi as Wikipedia, we just included much of GeoNames‘ 8 million entries into YooName. Our lexicons increased by 750%, to more than 3,000,000 named entities.

YooName is now powered by: Powered by GeoNames and Powered by Wikipedia

Did you know these common words could stand for city/town names?