<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>YooName - named entity recognition</title>
	<atom:link href="http://yooname.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://yooname.wordpress.com</link>
	<description>Semi-supervised Named Entity Recognition software.</description>
	<lastBuildDate>Thu, 30 Apr 2009 18:39:01 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='yooname.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/71bde05b40c98060185cfaa5cb9bdf77?s=96&#038;d=http://s.wordpress.com/i/buttonw-com.png</url>
		<title>YooName - named entity recognition</title>
		<link>http://yooname.wordpress.com</link>
	</image>
			<item>
		<title>YooName is *not* a search engine</title>
		<link>http://yooname.wordpress.com/2009/04/30/yooname-is-not-a-search-engine/</link>
		<comments>http://yooname.wordpress.com/2009/04/30/yooname-is-not-a-search-engine/#comments</comments>
		<pubDate>Thu, 30 Apr 2009 18:39:01 +0000</pubDate>
		<dc:creator>yooname</dc:creator>
				<category><![CDATA[YooName News]]></category>

		<guid isPermaLink="false">http://yooname.wordpress.com/?p=91</guid>
		<description><![CDATA[(and other frequently given answers)
In the last few weeks, YooName traffic increased dramatically (ten fold),  and so did the volume of emails. Don&#8217;t be offended if I answer your email by linking to this post. I think this is a good place and good time to address the most frequent concerns :
1. YooName is not [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=91&subd=yooname&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>(<em>and other frequently given answers</em>)</p>
<p>In the last few weeks, YooName traffic increased dramatically (ten fold),  and so did the volume of emails. Don&#8217;t be offended if I answer your email by linking to this post. I think this is a good place and good time to address the most frequent concerns :</p>
<p><strong>1. YooName is not a search engine</strong><br />
<strong></strong></p>
<p>Don&#8217;t expect YooName to get a list of web sites when you issue a query in the demo page. YooName is not a search engine. There&#8217;s a confusion because we often describe YooName as a <em>potential search engine component</em>, or a <em>novel algorithm for improving web search</em>.</p>
<p>YooName is self-improving <a href="http://en.wikipedia.org/wiki/Named_entity_recognition">named entity recognition</a> (NER) system. If you know what NER is then you probably have an idea how it relates to search engines. If not, then this is less obvious. In short, NER allows structuring textual information, and structured information is important for semantic search technologies.</p>
<p><strong>2. YooName is not a commercial project</strong> <strong>per se</strong></p>
<p>YooName is a technology showcase for my PhD project.</p>
<p><strong>3. No, I didn&#8217;t hired a lawyer to write a formal privacy policy</strong></p>
<p>I order to sign up for the YooName demo, we collect your email. This is the simplest form of verification we could imagine to avoid being scrapped by robots and/or <a href="http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk">mechanical turk</a>. Also, when you send a text to the demo, it is stored in the system for statistics and quality insurance. These are two frequent privacy concerns expressed by the demo users.</p>
<p><em><strong>E-mails</strong></em>: I use the demo user email database with the greatest diligence. I do not share it and I do not mass-mail for fun. In fact, in the two years of existence of the demo site, I haven&#8217;t use it yet. As the sign up form tells it: &#8220;We will not share your e-mail. We may send you news about YooName developments. We will promptly remove your e-mail from our database upon request.&#8221;</p>
<p><strong><em>Texts</em></strong>: The text you send to the demo are stored and used internally. This information is not shared and is destroyed periodically. Again, if you think that you sent sensible information in the system and want it to be destroyed, drop me a line and I&#8217;ll wipe out information linked to your username.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yooname.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yooname.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yooname.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yooname.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yooname.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yooname.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yooname.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yooname.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yooname.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yooname.wordpress.com/91/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=91&subd=yooname&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://yooname.wordpress.com/2009/04/30/yooname-is-not-a-search-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0c4376092afee27f7825ed676aeafbb1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yooname</media:title>
		</media:content>
	</item>
		<item>
		<title>Pushing Automation a Step Forward</title>
		<link>http://yooname.wordpress.com/2008/12/13/pushing-automation-a-step-forward/</link>
		<comments>http://yooname.wordpress.com/2008/12/13/pushing-automation-a-step-forward/#comments</comments>
		<pubDate>Sat, 13 Dec 2008 16:09:30 +0000</pubDate>
		<dc:creator>yooname</dc:creator>
				<category><![CDATA[NE Ecosystem]]></category>
		<category><![CDATA[Bootstrapping]]></category>
		<category><![CDATA[information extraction]]></category>
		<category><![CDATA[wrapper induction]]></category>

		<guid isPermaLink="false">http://yooname.wordpress.com/?p=74</guid>
		<description><![CDATA[(and hope not to fall off the cliff)
* * *
I recently worked on the implementation of ‘Stacked Skews Model’, an algorithm proposed by Andrew Carlson and Charles Schafer.

The idea is to train a web page wrapper induction algorithm (let’s call that a ‘wrapper’) at extracting information using a small number of already trained wrappers for [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=74&subd=yooname&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p><em>(and hope not to fall off the cliff)</em></p>
<p style="text-align:center;">* * *</p>
<p><span lang="EN-CA">I recently worked on the implementation of ‘<a title="Stacked skews models" href="http://portal.acm.org/citation.cfm?id=1431967&amp;jmp=cit&amp;coll=GUIDE&amp;dl=ACM">Stacked Skews Model</a>’, an algorithm proposed by Andrew Carlson and Charles Schafer.</span></p>
<p class="MsoNormal" style="text-align:center;">
<p class="MsoNormal" style="text-align:left;"><span lang="EN-CA">The idea is to train a <a title="Cohen and Fan wrappers" href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.4768">web page wrapper induction algorithm</a> (let’s call that a ‘wrapper’) at extracting information using a small number of already trained wrappers for sites in the same domain. For instance, if you already have in hands four wrappers for hotel booking web sites then you can use them to bootstrap new wrappers for virtually any hotel booking web site out there. </span></p>
<p class="MsoNormal" style="text-align:center;"><span lang="EN-CA"><img class="size-medium wp-image-82 aligncenter" title="tagging" src="http://yooname.files.wordpress.com/2008/12/tagging.jpg?w=300&#038;h=218" alt="tagging" width="300" height="218" /></span></p>
<p class="MsoNormal" style="text-align:center;"><span lang="EN-CA"><strong><em>sample web page wrapper annotations</em></strong></span></p>
<p class="MsoNormal"><span lang="EN-CA">What’s clever in Carlson and Schafer’s solution is overcoming the lack of annotated examples, given the huge search space for such a problem, by working on features distribution and distribution divergences instead on relying directly on surface evidences. In other words, when the system learns what the name of a hotel is, it learns how each feature is distributed and how similar the solution must be (e.g., hotel name length is around 20 characters, hotel name often contains the trigram ‘<strong>hot</strong>el’ or ‘<strong>res</strong>ort’, etc.). It is basically equivalent to creating one classifier for each feature and, as the authors suggest, <a title="Stacked generalization" href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.1533">stack them using linear regression</a>.</span></p>
<p class="MsoNormal"><span lang="EN-CA">My implementation didn’t exactly worked as advertised, which is normal ;) Even if stacked models reduce the feature space and diminish overfitting, the problem is still enormous and one or two features tend to rule out the stack. However, I did some important progress by playing around the published ideas. </span></p>
<p class="MsoNormal"><span lang="EN-CA">First, do connect on an ontology. Ok, I’m not a big fan of ontological features and only use them in the last resort but here, it did a good difference. When wrapping hotel web sites, connect on <a title="WordNet" href="http://wordnet.princeton.edu/">WordNet</a> synset ‘hotel’ and use all synonyms and related words as features. </span></p>
<p class="MsoNormal"><span lang="EN-CA">Also, do use DOM tree features. In their article, Carlson and Schafer limit the learning to features on textual information (the current node text and the previous node text). However, DOM tree is very useful here. For instance, desirable information tend to be deep and almost in juxtaposition. Also, an hotel name is more likely to be in its own HTML tag (bold, header, etc.) while amenities are often enumerated (lists, table, etc.).</span></p>
<p class="MsoNormal"><span lang="EN-CA">Finally, in order to reduce overfitting further, I split the feature space in independent groups and applied a voting scheme over the ensemble.<br />
</span></p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yooname.wordpress.com/74/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yooname.wordpress.com/74/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yooname.wordpress.com/74/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yooname.wordpress.com/74/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yooname.wordpress.com/74/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yooname.wordpress.com/74/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yooname.wordpress.com/74/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yooname.wordpress.com/74/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yooname.wordpress.com/74/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yooname.wordpress.com/74/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=74&subd=yooname&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://yooname.wordpress.com/2008/12/13/pushing-automation-a-step-forward/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0c4376092afee27f7825ed676aeafbb1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yooname</media:title>
		</media:content>

		<media:content url="http://yooname.files.wordpress.com/2008/12/tagging.jpg?w=300" medium="image">
			<media:title type="html">tagging</media:title>
		</media:content>
	</item>
		<item>
		<title>The New York Times Annotated Corpus</title>
		<link>http://yooname.wordpress.com/2008/11/01/the-new-york-times-annotated-corpus/</link>
		<comments>http://yooname.wordpress.com/2008/11/01/the-new-york-times-annotated-corpus/#comments</comments>
		<pubDate>Sat, 01 Nov 2008 12:27:32 +0000</pubDate>
		<dc:creator>yooname</dc:creator>
				<category><![CDATA[NE Ecosystem]]></category>
		<category><![CDATA[corpus]]></category>
		<category><![CDATA[nyt]]></category>
		<category><![CDATA[text mining]]></category>

		<guid isPermaLink="false">http://yooname.wordpress.com/?p=71</guid>
		<description><![CDATA[The New York Times just released (through LDC) a gigantic corpus including:
Over 1.5 million articles manually tagged by The New York Times Index Department with a normalized indexing vocabulary of people, organizations, locations and topic descriptors. [...] Articles are tagged for persons,    places, organizations, titles and topics using a controlled vocabulary that [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=71&subd=yooname&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>The New York Times just released (<a title="NYT text mining corpus" href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T19">through LDC</a>) a gigantic corpus including:</p>
<blockquote><p>Over 1.5 million articles manually tagged by The New York Times Index Department with a normalized indexing vocabulary of people, organizations, locations and topic descriptors. [...] Articles are tagged for persons,    places, organizations, titles and topics using a controlled vocabulary that    is applied consistently across articles. For instance if one article mentions    &#8220;Bill Clinton&#8221; and another refers to &#8220;President William Jefferson    Clinton&#8221;, both articles will be tagged with &#8220;CLINTON, BILL&#8221;.</p></blockquote>
<p>According to <a href="http://groups.google.com/group/nytnlp/web/new_york_times_annotated_corpus.pdf">the documentation</a>, there are hand-assigned meta annotations (describing text content) using a controlled vocabulary:</p>
<ul>
<li>1.3M persons</li>
<li>600k locations</li>
<li>600k organizations</li>
</ul>
<p>as well as algorithmically assigned and manually verified online annotations (tagged within the text):</p>
<ul>
<li>114k persons</li>
<li>124k locations</li>
<li>136k organizations</li>
</ul>
<p>Thanks <a href="http://www.apperceptual.com">Peter</a> for forwarding <a href="http://thenoisychannel.com/2008/10/31/all-the-news-thats-fit-to-text-mine/">the news</a>.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yooname.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yooname.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yooname.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yooname.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yooname.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yooname.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yooname.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yooname.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yooname.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yooname.wordpress.com/71/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=71&subd=yooname&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://yooname.wordpress.com/2008/11/01/the-new-york-times-annotated-corpus/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0c4376092afee27f7825ed676aeafbb1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yooname</media:title>
		</media:content>
	</item>
		<item>
		<title>Semantic Knowledge Discovery, Organization and Use</title>
		<link>http://yooname.wordpress.com/2008/08/28/semantic-knowledge-discovery-organization-and-use/</link>
		<comments>http://yooname.wordpress.com/2008/08/28/semantic-knowledge-discovery-organization-and-use/#comments</comments>
		<pubDate>Fri, 29 Aug 2008 01:12:15 +0000</pubDate>
		<dc:creator>yooname</dc:creator>
				<category><![CDATA[NE Ecosystem]]></category>
		<category><![CDATA[nyu]]></category>
		<category><![CDATA[Semantic Knowledge Discovery]]></category>
		<category><![CDATA[symposium]]></category>

		<guid isPermaLink="false">http://yooname.wordpress.com/?p=61</guid>
		<description><![CDATA[The University of New York will host a symposium on Semantic Knowledge Discovery mid-November. The presentations will consist of invited talks by leaders (see below) in the field and partially of general submissions.

The focus of NLP research has been shifting towards semantic analysis from syntactic analysis. It has become evident that the methods employed for [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=61&subd=yooname&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>The University of New York will host a <a href="http://nlp.cs.nyu.edu/sk-symposium/">symposium on Semantic Knowledge Discovery</a> mid-November. The presentations will consist of invited talks by leaders (see below) in the field and partially of general submissions.</p>
<p style="text-align:center;"><a href="http://nlp.cs.nyu.edu/sk-symposium/"></a></p>
<p>The focus of NLP research has been shifting towards semantic analysis from syntactic analysis. It has become evident that the methods employed for developing syntactic analyzers, i.e. supervised methods using small annotated corpora, are not the best methods for the semantic task. In order to handle semantics, we need large amounts of knowledge which may be best collected by semi/un-supervised methods from a huge unannotated corpus.</p>
<p><a href="http://yooname.files.wordpress.com/2008/08/leaders-semantic-knowledge-discovery1.png"><img class="aligncenter size-large wp-image-65" src="http://yooname.files.wordpress.com/2008/08/leaders-semantic-knowledge-discovery1.png?w=450&#038;h=271" alt="" width="450" height="271" /></a></p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/yooname.wordpress.com/61/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/yooname.wordpress.com/61/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yooname.wordpress.com/61/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yooname.wordpress.com/61/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yooname.wordpress.com/61/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yooname.wordpress.com/61/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yooname.wordpress.com/61/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yooname.wordpress.com/61/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yooname.wordpress.com/61/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yooname.wordpress.com/61/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yooname.wordpress.com/61/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yooname.wordpress.com/61/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=61&subd=yooname&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://yooname.wordpress.com/2008/08/28/semantic-knowledge-discovery-organization-and-use/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0c4376092afee27f7825ed676aeafbb1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yooname</media:title>
		</media:content>

		<media:content url="http://yooname.files.wordpress.com/2008/08/leaders-semantic-knowledge-discovery1.png?w=450" medium="image" />
	</item>
		<item>
		<title>Interesting new research problem</title>
		<link>http://yooname.wordpress.com/2008/06/25/interesting-new-research-problem/</link>
		<comments>http://yooname.wordpress.com/2008/06/25/interesting-new-research-problem/#comments</comments>
		<pubDate>Wed, 25 Jun 2008 12:32:55 +0000</pubDate>
		<dc:creator>yooname</dc:creator>
				<category><![CDATA[NE Ecosystem]]></category>
		<category><![CDATA[google news]]></category>
		<category><![CDATA[news aggregation]]></category>
		<category><![CDATA[research problem]]></category>

		<guid isPermaLink="false">http://yooname.wordpress.com/?p=60</guid>
		<description><![CDATA[ 
This article, found via digg, highlights an inherent &#8216;by-design&#8217; flaw of automatic news aggregators, including Google News: they need a significant amount of press coverage before promoting news to their front page. As a result, automatic news aggregators are often hours late in covering breaking news.
The solution to the problem of &#8220;finding the most [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=60&subd=yooname&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p><!--[if gte mso 9]&gt; Normal   0         21         false   false   false      FR   X-NONE   X-NONE                                                     MicrosoftInternetExplorer4 &lt;![endif]--><!--[if gte mso 9]&gt; &lt;![endif]--><!--  --><!--[if gte mso 10]&gt; &lt;!   /* Style Definitions */  table.MsoNormalTable 	{mso-style-name:"Tableau Normal"; 	mso-tstyle-rowband-size:0; 	mso-tstyle-colband-size:0; 	mso-style-noshow:yes; 	mso-style-priority:99; 	mso-style-qformat:yes; 	mso-style-parent:""; 	mso-padding-alt:0cm 5.4pt 0cm 5.4pt; 	mso-para-margin-top:0cm; 	mso-para-margin-right:0cm; 	mso-para-margin-bottom:10.0pt; 	mso-para-margin-left:0cm; 	line-height:115%; 	mso-pagination:widow-orphan; 	font-size:11.0pt; 	font-family:"Calibri","sans-serif"; 	mso-ascii-font-family:Calibri; 	mso-ascii-theme-font:minor-latin; 	mso-fareast-font-family:"Times New Roman"; 	mso-fareast-theme-font:minor-fareast; 	mso-hansi-font-family:Calibri; 	mso-hansi-theme-font:minor-latin;} --> <!--[endif]--></p>
<p><a href="http://www.networkworld.com/community/node/29230">This article</a>, found <a href="http://digg.com/tech_news/When_major_stories_break_Google_News_dawdles">via digg</a>, highlights an inherent &#8216;by-design&#8217; flaw of automatic news aggregators, including Google News: they need a significant amount of press coverage before promoting news to their front page. As a result, automatic news aggregators are often hours late in covering breaking news.</p>
<p>The solution to the problem of &#8220;finding the most important news right now&#8221; <em>cannot</em> rely on one hour or so of news history. After one hour, it is no more a breaking news. It is late and repetitive.</p>
<p>Let&#8217;s formulate a challenging research problem from that: &#8220;Given novel and unique news, can you predict that there will be thousand of repetitions and reformulations?&#8221;</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/yooname.wordpress.com/60/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/yooname.wordpress.com/60/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yooname.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yooname.wordpress.com/60/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yooname.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yooname.wordpress.com/60/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yooname.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yooname.wordpress.com/60/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yooname.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yooname.wordpress.com/60/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yooname.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yooname.wordpress.com/60/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=60&subd=yooname&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://yooname.wordpress.com/2008/06/25/interesting-new-research-problem/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0c4376092afee27f7825ed676aeafbb1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yooname</media:title>
		</media:content>
	</item>
		<item>
		<title>DayLife Developer Challenge</title>
		<link>http://yooname.wordpress.com/2008/06/13/daylife-developer-challenge/</link>
		<comments>http://yooname.wordpress.com/2008/06/13/daylife-developer-challenge/#comments</comments>
		<pubDate>Fri, 13 Jun 2008 17:13:09 +0000</pubDate>
		<dc:creator>yooname</dc:creator>
				<category><![CDATA[NE Ecosystem]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[challenge]]></category>
		<category><![CDATA[DayLife]]></category>
		<category><![CDATA[news aggregation]]></category>

		<guid isPermaLink="false">http://yooname.wordpress.com/?p=57</guid>
		<description><![CDATA[DayLife is staging a challenge from June 3rd to June July 25th (extended date):
Build the future of news, in software!
Build an application that uses the Daylife API. No limits here: mashups, portals, widgets, iphone apps, blogging plugins, you name it.


DayLife is a news aggregation platform with strong named entity (NE) recognition capability. NEs are also [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=57&subd=yooname&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>DayLife is <a title="DayPI contest" href="http://developer.daylife.com/contest">staging a challenge</a> from June 3rd to <span style="color:#808080;"><span style="text-decoration:line-through;">June</span></span> July 25th (extended date):</p>
<blockquote><p>Build the future of news, in software!</p>
<p>Build an application that uses the Daylife API. No limits here: mashups, portals, widgets, iphone apps, blogging plugins, you name it.</p>
<p style="text-align:center;"><a href="http://yooname.files.wordpress.com/2008/06/dlpbwhthoriz.jpg"><img class="alignnone size-medium wp-image-58 aligncenter" src="http://yooname.files.wordpress.com/2008/06/dlpbwhthoriz.jpg?w=300&#038;h=74" alt="DayLife challenge" width="300" height="74" /></a></p>
</blockquote>
<p>DayLife is a news aggregation platform with strong named entity (NE) recognition capability. NEs are also called &#8216;Topics&#8217;, and they fall under the types &#8216;Person&#8217;, &#8216;Place&#8217; and &#8216;Organization&#8217;.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/yooname.wordpress.com/57/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/yooname.wordpress.com/57/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yooname.wordpress.com/57/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yooname.wordpress.com/57/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yooname.wordpress.com/57/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yooname.wordpress.com/57/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yooname.wordpress.com/57/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yooname.wordpress.com/57/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yooname.wordpress.com/57/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yooname.wordpress.com/57/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yooname.wordpress.com/57/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yooname.wordpress.com/57/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=57&subd=yooname&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://yooname.wordpress.com/2008/06/13/daylife-developer-challenge/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0c4376092afee27f7825ed676aeafbb1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yooname</media:title>
		</media:content>

		<media:content url="http://yooname.files.wordpress.com/2008/06/dlpbwhthoriz.jpg?w=300" medium="image">
			<media:title type="html">DayLife challenge</media:title>
		</media:content>
	</item>
		<item>
		<title>Ontology is Overrated</title>
		<link>http://yooname.wordpress.com/2008/06/05/ontology-is-overrated/</link>
		<comments>http://yooname.wordpress.com/2008/06/05/ontology-is-overrated/#comments</comments>
		<pubDate>Fri, 06 Jun 2008 03:12:17 +0000</pubDate>
		<dc:creator>yooname</dc:creator>
				<category><![CDATA[NE Ecosystem]]></category>
		<category><![CDATA[categorization]]></category>
		<category><![CDATA[folksonomy]]></category>
		<category><![CDATA[ontology]]></category>
		<category><![CDATA[tags]]></category>

		<guid isPermaLink="false">http://yooname.wordpress.com/?p=56</guid>
		<description><![CDATA[This is an extract (a summary by sentence extraction &#8211; like these old days text summarizers were doing ;) of Clay Shirky&#8217;s blog post titled &#8216;Ontology is Overrated&#8216;.
* * *
Today I want to talk about categorization, and [...] I want to convince you that many of the ways we&#8217;re attempting to apply categorization to the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=56&subd=yooname&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>This is an extract (a summary by sentence extraction &#8211; like these old days <a href="http://www.copernic.com/en/products/summarizer/">text summarizers</a> were doing ;) of Clay Shirky&#8217;s blog post titled &#8216;<a href="http://www.shirky.com/writings/ontology_overrated.html">Ontology is Overrated</a>&#8216;.</p>
<p style="text-align:center;"><strong>* * *</strong></p>
<p style="text-align:left;">Today I want to talk about categorization, and [...] I want to convince you that many of the ways we&#8217;re attempting to apply categorization to the electronic world are actually a bad fit.</p>
<p>What I think is coming instead are [..] organic ways of organizing information [...], based on two units &#8212; the link, which can point to anything, and the tag, which is a way of attaching labels to links.</p>
<p><strong>PART I: Classification and Its Discontents</strong></p>
<p>The question ontology asks is: What kinds of things exist or can exist in the world, and what manner of relations can those things have to each other?</p>
<p>If you&#8217;ve got a large, ill-defined corpus, if you&#8217;ve got naive users, if your cataloguers aren&#8217;t expert, if there&#8217;s no one to say authoritatively what&#8217;s going on, then ontology is going to be a bad strategy.</p>
<p>One of the biggest problems with categorizing things in advance is that it forces the categorizers [...] to guess what their users are thinking, and to make predictions about the future.</p>
<p>When people [are] offered search [e.g., Web search] and categorization [e.g., Web directory] side-by-side, fewer and fewer people [are] using categorization to find things.</p>
<p><strong>Part II: The Only Group That Can Categorize Everything Is Everybody</strong></p>
<p>Now imagine a world where everything can have a unique identifier. This should be easy, since that&#8217;s the world we currently live in &#8212; the URL gives us a way to create a globally unique ID for anything we need to point to.</p>
<p>And once you can do that, anyone can label those pointers, can tag those URLs, in ways that make them more valuable, and all without requiring top-down organization schemes.</p>
<p>As [Joshua] Schachter says of del.icio.us, &#8220;Each individual categorization scheme is worth less than a professional categorization scheme. But there are many, many more of them.&#8221; If you find a way to make it valuable to individuals to tag their stuff, you&#8217;ll generate a lot more data about any given object than if you pay a professional to tag it once and only once.</p>
<p>Well-managed, well-groomed organizational schemes get worse with scale, both because the costs of supporting such schemes at large volumes are prohibitive, and, as I noted earlier, scaling over time is also a serious problem. Tagging, by contrast, gets better with scale. With a multiplicity of points of view the question isn&#8217;t &#8220;Is everyone tagging any given link &#8216;correctly&#8217;&#8221;, but rather &#8220;Is anyone tagging it the way I do?&#8221; As long as at least one other person tags something they way you would, you&#8217;ll find it [...].</p>
<p>We are moving away from binary categorization &#8212; books either are or are not entertainment &#8212; and into this probabilistic world, where N% of users think books are entertainment.</p>
<p style="text-align:center;"><strong>* * *</strong></p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/yooname.wordpress.com/56/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/yooname.wordpress.com/56/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yooname.wordpress.com/56/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yooname.wordpress.com/56/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yooname.wordpress.com/56/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yooname.wordpress.com/56/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yooname.wordpress.com/56/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yooname.wordpress.com/56/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yooname.wordpress.com/56/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yooname.wordpress.com/56/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yooname.wordpress.com/56/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yooname.wordpress.com/56/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=56&subd=yooname&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://yooname.wordpress.com/2008/06/05/ontology-is-overrated/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0c4376092afee27f7825ed676aeafbb1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yooname</media:title>
		</media:content>
	</item>
		<item>
		<title>Difficult to Pwn IM Language iykwimaityd</title>
		<link>http://yooname.wordpress.com/2008/06/01/difficult-to-pwn-im-language-iykwimaityd/</link>
		<comments>http://yooname.wordpress.com/2008/06/01/difficult-to-pwn-im-language-iykwimaityd/#comments</comments>
		<pubDate>Sun, 01 Jun 2008 19:10:34 +0000</pubDate>
		<dc:creator>yooname</dc:creator>
				<category><![CDATA[YooName News]]></category>
		<category><![CDATA[IM]]></category>
		<category><![CDATA[Instant messaging]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[slang]]></category>

		<guid isPermaLink="false">http://yooname.wordpress.com/?p=55</guid>
		<description><![CDATA[Researchers at the University of Toronto, Canada, suggest that instant messaging represents &#8220;an expansive new linguistic renaissance&#8221; (story from New Scientist.)
We&#8217;ve tried seeding YooName with a list of well-known internet slang expressions such as: LOL, brb, and OMG.
YooName found 993 pages on the Internet containing lexicon (or structured repository) of Internet slang, and it collected [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=55&subd=yooname&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Researchers at the University of Toronto, Canada, suggest that instant messaging represents &#8220;an expansive new linguistic renaissance&#8221; (<a href="http://technology.newscientist.com/article/mg19826566.600-instant-messaging-a-linguistic-renaissance-for-teens.html">story from New Scientist</a>.)</p>
<p>We&#8217;ve tried seeding YooName with a list of well-known internet slang expressions such as: LOL, brb, and OMG.</p>
<p>YooName found 993 pages on the Internet containing lexicon (or structured repository) of Internet slang, and it collected a list of 1,718 unique expressions. Interestingly, more than a quarter of these expressions are ambiguous with other categories of words, for example brb (be right back) is also a tickers symbol,   lol (laugh out loud) is <a href="http://www.geonames.org/maps/google_-4.35_144.05.html">a place in Papua New Guinea</a>, and asap (as soon as possible) is also the name of a company.</p>
<p>We&#8217;ve updated YooName lexicon and rule system to recognize and annotate Internet slang&#8230; but because of its high ambiguity and unconventional syntax, it is very difficult to pwn!</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/yooname.wordpress.com/55/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/yooname.wordpress.com/55/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yooname.wordpress.com/55/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yooname.wordpress.com/55/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yooname.wordpress.com/55/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yooname.wordpress.com/55/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yooname.wordpress.com/55/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yooname.wordpress.com/55/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yooname.wordpress.com/55/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yooname.wordpress.com/55/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yooname.wordpress.com/55/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yooname.wordpress.com/55/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=55&subd=yooname&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://yooname.wordpress.com/2008/06/01/difficult-to-pwn-im-language-iykwimaityd/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0c4376092afee27f7825ed676aeafbb1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yooname</media:title>
		</media:content>
	</item>
		<item>
		<title>Top 5 Natural Language Processing Applications</title>
		<link>http://yooname.wordpress.com/2008/05/13/top-5-natural-language-processing-applications/</link>
		<comments>http://yooname.wordpress.com/2008/05/13/top-5-natural-language-processing-applications/#comments</comments>
		<pubDate>Tue, 13 May 2008 12:31:23 +0000</pubDate>
		<dc:creator>yooname</dc:creator>
				<category><![CDATA[NE Ecosystem]]></category>
		<category><![CDATA[application]]></category>
		<category><![CDATA[chat bot]]></category>
		<category><![CDATA[knowledge discovery]]></category>
		<category><![CDATA[machine translation]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[speech recognition]]></category>

		<guid isPermaLink="false">http://yooname.wordpress.com/?p=49</guid>
		<description><![CDATA[In the last decades, Natural Language Processing (NLP) has been equally hyped and criticized. All in all, many applications emerged in the real world following intense and continued research and development. Here&#8217;s a list of the most prominent success stories.
Given that this blog is about named entity recognition (NER), itself an NLP application, we would [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=49&subd=yooname&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>In the last decades, Natural Language Processing (NLP) has been equally hyped and criticized. All in all, many applications emerged in the real world following intense and continued research and development. Here&#8217;s a list of the most prominent success stories.</p>
<p>Given that this blog is about named entity recognition (NER), itself an NLP application, we would be biased at including NER to this list. As such, we&#8217;ve excluded ourselves from the chart-toppers ;)</p>
<p><strong>#5: Chat bots</strong></p>
<pre>"HELLO, MY NAME IS DOCTOR SBAITSO.</pre>
<pre>I AM HERE TO HELP YOU.
SAY WHATEVER IS ON YOUR MIND FREELY,
OUR CONVERSATION WILL BE KEPT IN THE STRICTEST CONFIDENCE.
MEMORY CONTENTS WILL BE WIPED CLEAN AFTER YOU LEAVE,</pre>
<pre>SO, TELL ME ABOUT YOUR PROBLEMS."</pre>
<p>The first time I chatted with <a href="http://en.wikipedia.org/wiki/Dr._Sbaitso">Dr. Sbaitso</a>, I was about 12 years old. Probably more than anything else, it has influenced my career path. Since then, chat bots such as <a href="http://en.wikipedia.org/wiki/ELIZA">ELIZA</a>, <a href="http://en.wikipedia.org/wiki/Artificial_Linguistic_Internet_Computer_Entity">A.L.I.C.E.</a> and <a href="http://en.wikipedia.org/wiki/Jabberwacky">Jabberwacky</a> propelled the art of conversational robots, leading to <a href="http://www.microsoft.com/serviceproviders/solutions/asa.mspx">Automated Service Agent</a> applications (see <a href="http://www.nextit.com/">NextIT</a>)</p>
<p>For its lasting impact on generations of NLP developers, and for the interesting improvements that ensued, Chat bots rank #5.</p>
<p><strong>#4: NLP-based search engines</strong></p>
<p><a href="http://www.ask.com/">Ask Jeeves</a> pioneered it, <a href="http://www.powerset.com/">Powerset</a> redefined it, but we are all somewhat skeptical when it comes to beating Google&#8217;s classic vector space models and ranking techniques., Do we really need shallow NLP parsing to answer &#8220;When did Einstein die,&#8221; or will <a href="http://scholar.google.com/scholar?hl=en&amp;q=the+One-Million+Fact+Extraction+Challenge">statistical fact extraction</a> suffice?</p>
<p>Though it is the Holy Grail of NLPers, it has not yet surpassed current information retrieval techniques. As such, NLP-based search engines rank #4.</p>
<p><strong>#3: Speech recognition</strong></p>
<p>Microsoft and Ford just teamed up to develop in-car speech recognition. But they forgot to include <a href="http://en.wikipedia.org/wiki/Electronic_Voice_Alert">Electronic Voice Alert</a>, a feature of mid-80s luxury Chrysler cars!</p>
<p>In all seriousness, automatic speech recognition (ASR) is a vital application for hand-free computing (for disabled persons or for certain circumstances, such as driving), and <a href="http://www.nuance.com/naturallyspeaking/">transcription</a>. It is also poised to revolutionize <a href="http://www.coveo.com/en/Products/CAVS.aspx">audio-video content</a> retrieval.</p>
<p>For where it came from, and for where it&#8217;s going, ASR ranks #3.</p>
<p><strong>#2: Machine translation</strong></p>
<p><em>&#8220;It is apparent to me that the possibilities of the aeroplane, which two or three years ago were thought to hold the solution to the [flying machine] problem, have been exhausted, and that we must turn elsewhere.&#8221;</em> &#8211; <a title="http://en.wikiquote.org/wiki/Incorrect_predictions#Airplanes" href="http://en.wikiquote.org/wiki/Incorrect_predictions#Airplanes">Thomas Edison, inventor, 1895</a></p>
<p>The &#8220;heavier-than-air&#8221; problem that once plagued flight technology is probably the <a href="http://apperceptual.wordpress.com/2008/01/07/artificial-intelligence-considered-as-heavier-than-air-flight/">best comparison we can make to AI</a> and machine translation (MT). It was long believed that MT would require a completely automatic understanding of human language before a resolution finally came. But today&#8217;s <a href="http://www.google.com/translate_t">Google</a> and <a href="http://iit-iti.nrc-cnrc.gc.ca/projects-projets/portage_e.html">Government of Canada</a> systems surpass human translation abilities (can you translate from French to Chinese? Not me.) Their good level of precision makes them useful in many applications.</p>
<p>People are constantly <a href="http://www.news.com/8301-13577_3-9857280-36.html">pinpointing these systems&#8217; shortcomings,</a> but nobody would contest their second-place ranking on this list.</p>
<p><strong>#1: Knowledge discovery in texts</strong></p>
<p>Have you ever heard of software that finds new relationships and interactions between genes, proteins or cells? By mining large collections of scientific literature, NLP agents can discover and highlight novel and surprising knowledge.</p>
<p>What makes knowledge discovery so promising is the hope that, in the near future, we may monitor all these documents that are just too abundant to be processed manually. Early forms of knowledge discovery, such as <a href="http://en.wikipedia.org/wiki/Data_mining">data mining</a>, are already used for Business Intelligence (BI) and outside the NLP world, examples of <a href="http://www.popsci.com/scitech/article/2006-04/john-koza-has-built-invention-machine">machine-made inventions</a> already exist.</p>
<p>As a form of <a href="http://en.wikipedia.org/wiki/Technological_singularity">technological singularity</a>, and as an emerging field of research for NLP, knowledge discovery gets first place on this list of top NLP applications.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/yooname.wordpress.com/49/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/yooname.wordpress.com/49/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yooname.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yooname.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yooname.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yooname.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yooname.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yooname.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yooname.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yooname.wordpress.com/49/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yooname.wordpress.com/49/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yooname.wordpress.com/49/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=49&subd=yooname&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://yooname.wordpress.com/2008/05/13/top-5-natural-language-processing-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0c4376092afee27f7825ed676aeafbb1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yooname</media:title>
		</media:content>
	</item>
		<item>
		<title>YooName&#8217;s creator honored at 2008 OCRI awards</title>
		<link>http://yooname.wordpress.com/2008/04/08/yoonames-creator-honored-at-2008-ocri-awards/</link>
		<comments>http://yooname.wordpress.com/2008/04/08/yoonames-creator-honored-at-2008-ocri-awards/#comments</comments>
		<pubDate>Tue, 08 Apr 2008 19:17:40 +0000</pubDate>
		<dc:creator>yooname</dc:creator>
				<category><![CDATA[YooName News]]></category>

		<guid isPermaLink="false">http://yooname.wordpress.com/?p=54</guid>
		<description><![CDATA[OTTAWA, Canada, April 3, 2008 – OCRI, Ottawa&#8217;s economic development agency honoured Ottawa&#8217;s best and brightest companies, executives and students for their innovative work and contributions to the city&#8217;s knowledge-based sector at the 13th annual OCRI Awards gala.
David Nadeau from the University of Ottawa received the Student Researcher of the Year award for inspired research [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=54&subd=yooname&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p><strong>OTTAWA, Canada, April 3, 2008</strong> – <a href="http://www.ocri.ca/">OCRI</a>, Ottawa&#8217;s economic development agency honoured Ottawa&#8217;s best and brightest companies, executives and students for their innovative work and contributions to the city&#8217;s knowledge-based sector at the 13th annual OCRI Awards gala.</p>
<p>David Nadeau from the University of Ottawa received the <strong>Student Researcher of the Year award</strong> for inspired research resulting in a more intelligent on-line search engine and his commercialization efforts which launched YooName.com last November.</p>
<p>[<a href="http://www.ocri.ca/email_broadcasts/newsreleases/040308news_e.html">see the full press release</a>]</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/yooname.wordpress.com/54/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/yooname.wordpress.com/54/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yooname.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yooname.wordpress.com/54/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yooname.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yooname.wordpress.com/54/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yooname.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yooname.wordpress.com/54/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yooname.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yooname.wordpress.com/54/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yooname.wordpress.com/54/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yooname.wordpress.com/54/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yooname.wordpress.com&blog=1121062&post=54&subd=yooname&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://yooname.wordpress.com/2008/04/08/yoonames-creator-honored-at-2008-ocri-awards/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0c4376092afee27f7825ed676aeafbb1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yooname</media:title>
		</media:content>
	</item>
	</channel>
</rss>