<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.nodalpoint.org" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>nodalpoint.org - doubt - Comments</title>
 <link>http://www.nodalpoint.org/nodalpoint_tags/doubt</link>
 <description>Comments for &quot;doubt&quot;</description>
 <language>en</language>
<item>
 <title>String-blasting</title>
 <link>http://www.nodalpoint.org/2007/06/17/blast_is_the_same_as_google_but_for_sequences#comment-3678</link>
 <description>&lt;p&gt;It&#039;s useful to apply BLAST-like techniques for searching over strings.  This has been used to find variations of the names of genes in text by several groups.  First, &lt;a href=&quot;http://www.yalepath.org/facultydb/id=KrauthammerM.htm&quot;&gt;Michael Krauthammer&lt;/a&gt;, when he was at Columbia, used base-pairs to encode arbitrary strings (they&#039;re usually encoding amino acids), then queries gene and protein names for matches in the text of journal articles.  &lt;/p&gt;
&lt;p&gt;BLAST is really just edit distance with some exclusion heuristics which don&#039;t work at all well on small strings.  So it&#039;s more natural to implement this notion directly, as &lt;a href=&quot;http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/papers/acl03bio.pdf&quot;&gt;Tsuruoka and Tsujii&lt;/a&gt; did.   There&#039;s a nice description of the algorithms in &lt;a href=&quot;http://wwwcsif.cs.ucdavis.edu/~gusfield/&quot;&gt;Dan Gusfield&lt;/a&gt;&#039;s &lt;a href=&quot;http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=9780521585194&quot;&gt;string algorithm bible&lt;/a&gt;.  &lt;/p&gt;
&lt;p&gt;Our LingPipe software provides an implementation of approximate dictionary matching following Gusfield.  Here&#039;s a link to the class Javadoc:  &lt;a href=&quot;http://www.alias-i.com/lingpipe/docs/api/com/aliasi/dict/ApproxDictionaryChunker.html&quot;&gt;com.aliasi.dict.ApproxDictionaryChunker&lt;/a&gt;.  We provide Tsuruoka and Tsujii&#039;s distance metric as a constant, but the distances are plug-and-play.&lt;/p&gt;
&lt;p&gt;The really critical issue here is not just finding approximate matches of names of biomedical entities, but also disambiguating them.  The acronym &quot;ACT&quot; means a lot of different things in different contexts.  Figuring out which sense of a word or phrase is intended is a widely studied problem usually going under the heading of word sense disambiguation for common nouns or database linkage for proper nouns. This can either be done via unsupervised clustering, or by supervised database linkage if there are example contexts.  Luckily, databases such as Entrez and KEGG provide GeneRIFs which include pointers to articles about specific genes.  And evaluations like &lt;a href=&quot;http://biocreative.sourceforge.net/&quot;&gt;Biocreative&lt;/a&gt; are evaluating abilities of systems to figure out which gene is being mentioned in an article.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.colloquial.com/carp&quot;&gt;Bob Carpenter&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.alias-i.com/lingpipe&quot;&gt;Alias-i, Inc.&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 20 Jun 2007 13:01:43 -0400</pubDate>
 <dc:creator>Bob Carpenter</dc:creator>
 <guid isPermaLink="false">comment 3678 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Flawed, but useful analogy: Bloogle</title>
 <link>http://www.nodalpoint.org/2007/06/17/blast_is_the_same_as_google_but_for_sequences#comment-3674</link>
 <description>&lt;p&gt;I think Google is still a useful analogy for explaining BLAST to wet bench biologists, even if it does have its flaws. As for &quot;Google isn&#039;t statistical&quot;, I disagree. What about all that statistics, probability and machine learning they use to builld and improve search results? Google (and other search engines) have very well defined metrics for measuring search quality, it&#039;s not all subjective. So despite its problems, search is still a handy analogy for describing BLAST that many people will be familiar with.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Mon, 18 Jun 2007 07:29:40 -0400</pubDate>
 <dc:creator>Duncan</dc:creator>
 <guid isPermaLink="false">comment 3674 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Analogy</title>
 <link>http://www.nodalpoint.org/2007/06/17/blast_is_the_same_as_google_but_for_sequences#comment-3673</link>
 <description>&lt;p&gt;Or, BLAST is like a microwave oven, except for sequences and not frozen burritos.&lt;/p&gt;
&lt;p&gt;But seriously, I think the Google analogy isn&#039;t very good because&lt;br /&gt;
1) You don&#039;t search using a subject in BLAST, but by using another sequence. If Google worked that way, you&#039;d give it a web page and it would find web pages similar to it.&lt;br /&gt;
2) BLAST is statistical, Google isn&#039;t. The only measure of how good a Google search is is the (subjective) opinion of the searcher.&lt;/p&gt;
&lt;p&gt;If you want an analogy, assuming the students know some bench biology, how about &quot;BLAST is an electronic Southern Blot&quot;?&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Mon, 18 Jun 2007 07:02:50 -0400</pubDate>
 <dc:creator>Jonathan_Badger</dc:creator>
 <guid isPermaLink="false">comment 3673 at http://www.nodalpoint.org</guid>
</item>
</channel>
</rss>
