<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.nodalpoint.org" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>nodalpoint.org - blast - Comments</title>
 <link>http://www.nodalpoint.org/nodalpoint_tags/blast</link>
 <description>Comments for &quot;blast&quot;</description>
 <language>en</language>
<item>
 <title>String-blasting</title>
 <link>http://www.nodalpoint.org/2007/06/17/blast_is_the_same_as_google_but_for_sequences#comment-3678</link>
 <description>&lt;p&gt;It&#039;s useful to apply BLAST-like techniques for searching over strings.  This has been used to find variations of the names of genes in text by several groups.  First, &lt;a href=&quot;http://www.yalepath.org/facultydb/id=KrauthammerM.htm&quot;&gt;Michael Krauthammer&lt;/a&gt;, when he was at Columbia, used base-pairs to encode arbitrary strings (they&#039;re usually encoding amino acids), then queries gene and protein names for matches in the text of journal articles.  &lt;/p&gt;
&lt;p&gt;BLAST is really just edit distance with some exclusion heuristics which don&#039;t work at all well on small strings.  So it&#039;s more natural to implement this notion directly, as &lt;a href=&quot;http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/papers/acl03bio.pdf&quot;&gt;Tsuruoka and Tsujii&lt;/a&gt; did.   There&#039;s a nice description of the algorithms in &lt;a href=&quot;http://wwwcsif.cs.ucdavis.edu/~gusfield/&quot;&gt;Dan Gusfield&lt;/a&gt;&#039;s &lt;a href=&quot;http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=9780521585194&quot;&gt;string algorithm bible&lt;/a&gt;.  &lt;/p&gt;
&lt;p&gt;Our LingPipe software provides an implementation of approximate dictionary matching following Gusfield.  Here&#039;s a link to the class Javadoc:  &lt;a href=&quot;http://www.alias-i.com/lingpipe/docs/api/com/aliasi/dict/ApproxDictionaryChunker.html&quot;&gt;com.aliasi.dict.ApproxDictionaryChunker&lt;/a&gt;.  We provide Tsuruoka and Tsujii&#039;s distance metric as a constant, but the distances are plug-and-play.&lt;/p&gt;
&lt;p&gt;The really critical issue here is not just finding approximate matches of names of biomedical entities, but also disambiguating them.  The acronym &quot;ACT&quot; means a lot of different things in different contexts.  Figuring out which sense of a word or phrase is intended is a widely studied problem usually going under the heading of word sense disambiguation for common nouns or database linkage for proper nouns. This can either be done via unsupervised clustering, or by supervised database linkage if there are example contexts.  Luckily, databases such as Entrez and KEGG provide GeneRIFs which include pointers to articles about specific genes.  And evaluations like &lt;a href=&quot;http://biocreative.sourceforge.net/&quot;&gt;Biocreative&lt;/a&gt; are evaluating abilities of systems to figure out which gene is being mentioned in an article.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.colloquial.com/carp&quot;&gt;Bob Carpenter&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.alias-i.com/lingpipe&quot;&gt;Alias-i, Inc.&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 20 Jun 2007 13:01:43 -0400</pubDate>
 <dc:creator>Bob Carpenter</dc:creator>
 <guid isPermaLink="false">comment 3678 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Flawed, but useful analogy: Bloogle</title>
 <link>http://www.nodalpoint.org/2007/06/17/blast_is_the_same_as_google_but_for_sequences#comment-3674</link>
 <description>&lt;p&gt;I think Google is still a useful analogy for explaining BLAST to wet bench biologists, even if it does have its flaws. As for &quot;Google isn&#039;t statistical&quot;, I disagree. What about all that statistics, probability and machine learning they use to builld and improve search results? Google (and other search engines) have very well defined metrics for measuring search quality, it&#039;s not all subjective. So despite its problems, search is still a handy analogy for describing BLAST that many people will be familiar with.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Mon, 18 Jun 2007 07:29:40 -0400</pubDate>
 <dc:creator>Duncan</dc:creator>
 <guid isPermaLink="false">comment 3674 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Analogy</title>
 <link>http://www.nodalpoint.org/2007/06/17/blast_is_the_same_as_google_but_for_sequences#comment-3673</link>
 <description>&lt;p&gt;Or, BLAST is like a microwave oven, except for sequences and not frozen burritos.&lt;/p&gt;
&lt;p&gt;But seriously, I think the Google analogy isn&#039;t very good because&lt;br /&gt;
1) You don&#039;t search using a subject in BLAST, but by using another sequence. If Google worked that way, you&#039;d give it a web page and it would find web pages similar to it.&lt;br /&gt;
2) BLAST is statistical, Google isn&#039;t. The only measure of how good a Google search is is the (subjective) opinion of the searcher.&lt;/p&gt;
&lt;p&gt;If you want an analogy, assuming the students know some bench biology, how about &quot;BLAST is an electronic Southern Blot&quot;?&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Mon, 18 Jun 2007 07:02:50 -0400</pubDate>
 <dc:creator>Jonathan_Badger</dc:creator>
 <guid isPermaLink="false">comment 3673 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>mpiBLAST in EC2</title>
 <link>http://www.nodalpoint.org/2007/03/05/virtual_bioinformatics_clusters_with_ec2#comment-3534</link>
 <description>&lt;p&gt;I&#039;ve got mpiBLAST working well inside EC2. If you&#039;d like to try it out, you will find this document helpful.&lt;br /&gt;
&lt;a href=&quot;http://mpiblast.pbwiki.com/AmazonEC2&quot; title=&quot;http://mpiblast.pbwiki.com/AmazonEC2&quot;&gt;http://mpiblast.pbwiki.com/AmazonEC2&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I will be at BioIT World at the end of April, and happy to discuss this topic with others.&lt;/p&gt;
&lt;p&gt;Mike Cariaso * Bioinformatics Software * &lt;a href=&quot;http://www.cariaso.com&quot; title=&quot;http://www.cariaso.com&quot;&gt;http://www.cariaso.com&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Sat, 21 Apr 2007 14:15:31 -0400</pubDate>
 <dc:creator>cariaso</dc:creator>
 <guid isPermaLink="false">comment 3534 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>EC2 MPI quickstart</title>
 <link>http://www.nodalpoint.org/2007/03/05/virtual_bioinformatics_clusters_with_ec2#comment-3460</link>
 <description>&lt;p&gt;Check out the second part of the tutorial I just posted on &lt;a href=&quot;http://www.datawrangling.com&quot;&gt;Data Wrangling&lt;/a&gt;, it is a bit less lengthy and should let you get a cluster running in a few minutes using the &lt;a href=&quot;http://developer.amazonwebservices.com/connect/entry.jspa?externalID=705&amp;amp;categoryID=101&quot;&gt;Amazon EC2 public image&lt;/a&gt; published based on the first post.&lt;/p&gt;
&lt;p&gt;There are some Python scripts available on my blog to configure the MPI cluster on EC2, and it looks like you can hack them a bit to &lt;a href=&quot;http://www.mpiblast.org/Docs.Install.html&quot;&gt; configure an EC2 BLAST cluster&lt;/a&gt; based on my tutorial:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.datawrangling.com/mpi-cluster-with-python-and-amazon-ec2-part-2-of-3.html&quot;&gt;MPI Cluster with Python and Amazon EC2 (part 2 of 3)&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For message intensive code I expect performance will be a bit worse than people with research clusters are used to considering the effective 250 Mb/s interconnect.  I&#039;ll be doing some benchmarking in coming weeks.  The advantage of EC2 will be for startups and people who otherwise can&#039;t get access to or afford to build a permanent cluster.  As EC2 moves out of Beta, I would guess that there may be some high performance options if Amazon finds enough demand for it. &lt;/p&gt;
&lt;p&gt;-Pete&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 11 Apr 2007 21:53:55 -0400</pubDate>
 <dc:creator>Pete</dc:creator>
 <guid isPermaLink="false">comment 3460 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Re: new blast interface</title>
 <link>http://www.nodalpoint.org/2007/04/02/new_blast_design_to_bereleased_on_april_16#comment-3451</link>
 <description>&lt;p&gt;Finally!!&lt;br /&gt;
I&#039;ve never liked the actual interface ;)&lt;br /&gt;
In particular, I hated the pop-up window to show the results and I think the interface of the actual blast is not too clear.&lt;br /&gt;
The new feature to keep track of the recent results sounds cool.&lt;br /&gt;
Thanks ;)&lt;/p&gt;
&lt;p&gt;---&lt;/p&gt;
&lt;p&gt;WHO AM I?&lt;br /&gt;
Your name is , seeker.  &lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://genome.imim.es/~giovanni&quot; title=&quot;http://genome.imim.es/~giovanni&quot;&gt;http://genome.imim.es/~giovanni&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 03 Apr 2007 10:35:12 -0400</pubDate>
 <dc:creator>dalloliogm</dc:creator>
 <guid isPermaLink="false">comment 3451 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>That tutorial is quite</title>
 <link>http://www.nodalpoint.org/2007/03/05/virtual_bioinformatics_clusters_with_ec2#comment-3417</link>
 <description>&lt;p&gt;That tutorial is quite full-on, a bit too much for me at the moment. I thought I remember someone offering to transfer their beta EC2 account to you in the comments of your post ?&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Mon, 19 Mar 2007 10:26:38 -0400</pubDate>
 <dc:creator>Greg</dc:creator>
 <guid isPermaLink="false">comment 3417 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Collaboration</title>
 <link>http://www.nodalpoint.org/2007/03/05/virtual_bioinformatics_clusters_with_ec2#comment-3411</link>
 <description>&lt;p&gt;I still don&#039;t have a EC2 account, but it looks like all the heavy lifting will have been done by the time I get a chance to play with EC2 ... check out Peter Skomoroch&#039;s extensive tutorial &lt;a href=&quot;http://www.datawrangling.com/on-demand-mpi-cluster-with-python-and-ec2-part-1-of-3.html&quot;&gt;&quot;On-Demand MPI Cluster with Python and EC2 (part 1 of 3)&quot;&lt;/a&gt;, if you haven&#039;t caught it already.&lt;/p&gt;
&lt;p&gt;Andrew Perry -- &lt;a href=&quot;http://pansapiens.blogspot.com/&quot; title=&quot;http://pansapiens.blogspot.com/&quot;&gt;http://pansapiens.blogspot.com/&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Sun, 18 Mar 2007 15:33:13 -0400</pubDate>
 <dc:creator>pansapiens</dc:creator>
 <guid isPermaLink="false">comment 3411 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>mpiblast and EC2</title>
 <link>http://www.nodalpoint.org/2007/03/05/virtual_bioinformatics_clusters_with_ec2#comment-3384</link>
 <description>&lt;p&gt;I&#039;m an mpiBLAST developer and a heavy mpiBLAST user. I&#039;m also moderately active with EC2. Despite a lot of interest in merging these two interests together, I haven&#039;t yet found an opportunity. If anyone reading this is considering such a project I&#039;d be interested in comparing notes or collaborating.  &lt;/p&gt;
&lt;p&gt;As for MPI on EC2, like most parallel apps its a matter of the right topology for the right application. Communication between nodes is going to have fairly high latency compared to the more traditional dedicated clusters, but for some apps thats perfectly acceptable. The benefits of free transfers into and out of S3 is potentially a big win. &lt;/p&gt;
&lt;p&gt;I would like to imagine that we could someday host a shared version of the major blastable DBs in S3, to alleviate the maintenance and transfer costs. For the moment, its probably inappropriate since its more designed for total replacement than the incremental growth.&lt;/p&gt;
&lt;p&gt;--&lt;br /&gt;
Mike Cariaso * Bioinformatics Software * &lt;a href=&quot;http://www.cariaso.com&quot; title=&quot;http://www.cariaso.com&quot;&gt;http://www.cariaso.com&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Mon, 05 Mar 2007 16:07:14 -0500</pubDate>
 <dc:creator>cariaso</dc:creator>
 <guid isPermaLink="false">comment 3384 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>EC2</title>
 <link>http://www.nodalpoint.org/2007/03/05/virtual_bioinformatics_clusters_with_ec2#comment-3383</link>
 <description>&lt;p&gt;I&#039;ve heard the Amazon folk talk about EC2 a bit and also attended a workshop during Mindcamp.  From what I can gather, S3 would be perfect for a small startup biotech.  Scales well, cheap etc etc.  EC2 might work for a company that does computation in bursts, especially using apps like BLAST, but for a company that is constantly crunching data, I am not yet convinced EC2 is the right solution over something like Sun&#039;s on-demand offering, which is more suited for number crunching apps (as noted in the article, I would not run MPI apps on EC2 based on current knowledge).&lt;/p&gt;
&lt;p&gt;My Blog: &lt;a href=&quot;http://mndoci.com&quot; title=&quot;http://mndoci.com&quot;&gt;http://mndoci.com&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Mon, 05 Mar 2007 13:09:52 -0500</pubDate>
 <dc:creator>mndoci</dc:creator>
 <guid isPermaLink="false">comment 3383 at http://www.nodalpoint.org</guid>
</item>
</channel>
</rss>
