Wikipedia’s Inside

To many, ourselves included, Wikipedia is the most wonderful Web site ever created. That’s why we’re very excited to announce that we’ve included much of Wikipedia’s latent knowledge in YooName. The short story is that analyzing Wikipedia boosted YooName’s knowledge by 225%. It went from 175,000 to 400,000 entities.

The long story is as follows…

One of YooName’s most valuable assets is its automatically generated named entity lists. Lists are created by set expansion techniques [our paper], as seen on Google Sets, and, more recently, on Seal.

WikipediaThis iterative algorithm has a random factor: it bootstraps its knowledge by scouting the Web, starting with knowledge with the highest confidence from the latest iteration. Sometimes the algorithm hits Wikipedia, sometimes it doesn’t. However, we know that Wikipedia is full of collaboratively maintained high-quality lists of named entities. All in all, we found 32,000 pages in the Wikipedia dump that present lists with at least 10 elements. Of these lists, 1,500 intersected highly with YooName knowledge and we forced our set expansion algorithm to hit them.

Advertisements

2 Responses to “Wikipedia’s Inside”

  1. GeoNames’ Inside « YooName - named entity recognition Says:

    […] Using the same modus operandi as Wikipedia, we just included much of GeoNames‘ 8 million entries into YooName. Our lexicons increased by […]

  2. Prashanth Ellina Says:

    Wikipedia dumps are a great idea. I am exploring interesting ways to use them. I am posting progress here http://blog.prashanthellina.com/2007/10/17/ways-to-process-and-use-wikipedia-dumps/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: