<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.nodalpoint.org" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>nodalpoint.org - software - Comments</title>
 <link>http://www.nodalpoint.org/nodalpoint_tags/software</link>
 <description>Comments for &quot;software&quot;</description>
 <language>en</language>
<item>
 <title>You&#039;re welcome</title>
 <link>http://www.nodalpoint.org/2007/10/02/redundancy_reduction_of_sequence_sets#comment-4203</link>
 <description>&lt;p&gt;You&#039;re welcome and apologies for omitting &quot;-c 0.9&quot; in my first comment.  I&#039;m sure you worked it out.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Thu, 04 Oct 2007 03:44:26 -0400</pubDate>
 <dc:creator>Neil</dc:creator>
 <guid isPermaLink="false">comment 4203 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Thanks !</title>
 <link>http://www.nodalpoint.org/2007/10/02/redundancy_reduction_of_sequence_sets#comment-4202</link>
 <description>&lt;p&gt;Thank you Pawel and Neil.&lt;/p&gt;
&lt;p&gt;CD-HIT is exactly what I want, and it works great. The algorithm appears to take some smart short-cuts and reduces calculation time from would have taken many hours with my half written brute-force all-against-all pairwise alignment script down to seconds. If CD-HIT is good enough for UniProt, it&#039;s good enough for me :)&lt;/p&gt;
&lt;p&gt;Blastclust looks like another good option (it was even already installed on my machine ... right under my nose), but is orders of magnitude slower, so I&#039;m sticking with CD-HIT for my quick-n-dirty tasks.&lt;/p&gt;
&lt;p&gt;Lucky I didn&#039;t go very far coding something to do this myself (although thinking about the problem was enlightening). I&#039;ve got to remember to listen to my internal &quot;stop-immediately-you-are-reinventing-the-wheel&quot; alarm more often.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 03 Oct 2007 22:37:17 -0400</pubDate>
 <dc:creator>pansapiens</dc:creator>
 <guid isPermaLink="false">comment 4202 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>I second cd-hit</title>
 <link>http://www.nodalpoint.org/2007/10/02/redundancy_reduction_of_sequence_sets#comment-4201</link>
 <description>&lt;p&gt;&lt;a href=&quot;http://bioinformatics.org/cd-hit/&quot;&gt;CD-HIT&lt;/a&gt; works for me.  Simple as &quot;cd-hit -i file -o file90 -c 0.9 -n 5&quot;.&lt;/p&gt;
&lt;p&gt;Sorry about your comments Pawel, they went to spam for some reason.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 03 Oct 2007 07:10:33 -0400</pubDate>
 <dc:creator>Neil</dc:creator>
 <guid isPermaLink="false">comment 4201 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>cd-hit, blastclust</title>
 <link>http://www.nodalpoint.org/2007/10/02/redundancy_reduction_of_sequence_sets#comment-4200</link>
 <description>&lt;p&gt;I&#039;m not sure if my previous comment was saved... Anyway, I will point again to cd-hit and blastclust. However, the first does not make any alignments, and allows some redundancy in the set anyway.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 03 Oct 2007 06:43:08 -0400</pubDate>
 <dc:creator>pawel</dc:creator>
 <guid isPermaLink="false">comment 4200 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>MAFFT</title>
 <link>http://www.nodalpoint.org/2007/05/22/phylogenetics#comment-3620</link>
 <description>&lt;p&gt;Yep, MAFFT is fast and has excellent alignment quality. My last year&#039;s paper showed that it is the best for prtein alignment, even for distant sequences.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Thu, 31 May 2007 13:04:32 -0400</pubDate>
 <dc:creator>nuin</dc:creator>
 <guid isPermaLink="false">comment 3620 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>quicktree</title>
 <link>http://www.nodalpoint.org/2007/05/22/phylogenetics#comment-3618</link>
 <description>&lt;p&gt;I&#039;d use &lt;a href=&quot;http://align.bmr.kyushu-u.ac.jp/mafft/software/&quot;&gt;MAFFT&lt;/a&gt; for aligning something this big.&lt;br /&gt;
&lt;a href=&quot;http://www.sanger.ac.uk/Software/analysis/quicktree/&quot;&gt;quicktree&lt;/a&gt; was designed for building trees from large datasets like this (&lt;a href=&quot;http://www.sanger.ac.uk/Software/Pfam/&quot;&gt;Pfam &lt;/a&gt; families).&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 30 May 2007 19:07:49 -0400</pubDate>
 <dc:creator>jason</dc:creator>
 <guid isPermaLink="false">comment 3618 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Phylip</title>
 <link>http://www.nodalpoint.org/2007/05/22/phylogenetics#comment-3614</link>
 <description>&lt;p&gt;Apart from CD-HIT which is a very good software, I would try using Phylip. The sequences are not large but the actual number of sequences is the problem. Try using a Neighbor Joining approach in Phylip, it won&#039;t be blazing fast but it will do the job, eventually. &lt;/p&gt;
&lt;p&gt;I used Phylip to build a NJ tree of a set of 20000 protein sequences and it took me around 3-4 weeks to get it done on a 3GHz Xeon machine.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Mon, 28 May 2007 12:07:55 -0400</pubDate>
 <dc:creator>nuin</dc:creator>
 <guid isPermaLink="false">comment 3614 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>cd-hit</title>
 <link>http://www.nodalpoint.org/2007/05/22/phylogenetics#comment-3612</link>
 <description>&lt;p&gt;For clustering, &lt;a href=&quot;http://bioinformatics.ljcrf.edu/cd-hi/&quot;&gt;CD-HIT&lt;/a&gt; is excellent.  Very fast, handles many sequences.  Used to create the non-redundant datasets in UniProt and at the PDB.&lt;/p&gt;
&lt;p&gt;Phylogeny - I&#039;ve never gone much beyond Clustal and Phylip, both of which would take hours on an average machine with any more than a few thousand sequences.  I&#039;ve heard good things about &lt;a href=&quot;http://mrbayes.csit.fsu.edu/&quot;&gt;MrBayes&lt;/a&gt; - which is MPI-enabled, so could run on a cluster if you have access to one.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Sat, 26 May 2007 09:48:06 -0400</pubDate>
 <dc:creator>Neil</dc:creator>
 <guid isPermaLink="false">comment 3612 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Huge!</title>
 <link>http://www.nodalpoint.org/2007/05/22/phylogenetics#comment-3611</link>
 <description>&lt;p&gt;Alf that is a huge requirement! Are you trying to make tree of life with 1000 base upstream of some house-keeping genes?&lt;br /&gt;
I think MUSCLE [ &lt;a href=&quot;http://www.drive5.com/muscle/&quot; title=&quot;http://www.drive5.com/muscle/&quot;&gt;http://www.drive5.com/muscle/&lt;/a&gt; ] can come to rescue, but you would need a good machine for sure. It uses log-expectation as profile function which is faster and accurate as well [ &lt;a href=&quot;http://www.biomedcentral.com/1471-2105/5/113/table/T2&quot; title=&quot;http://www.biomedcentral.com/1471-2105/5/113/table/T2&quot;&gt;http://www.biomedcentral.com/1471-2105/5/113/table/T2&lt;/a&gt; ]. General algorithm is  &lt;a href=&quot;http://nar.oxfordjournals.org/content/vol32/issue5/images/large/gkh340f2.jpeg&quot; title=&quot;http://nar.oxfordjournals.org/content/vol32/issue5/images/large/gkh340f2.jpeg&quot;&gt;http://nar.oxfordjournals.org/content/vol32/issue5/images/large/gkh340f2...&lt;/a&gt; .&lt;br /&gt;
More details in the paper &lt;a href=&quot;http://nar.oxfordjournals.org/cgi/content/full/32/5/1792&quot; title=&quot;http://nar.oxfordjournals.org/cgi/content/full/32/5/1792&quot;&gt;http://nar.oxfordjournals.org/cgi/content/full/32/5/1792&lt;/a&gt; .&lt;/p&gt;
&lt;p&gt;______________________&quot;The Answer Lies in Genome&quot;______________________&lt;br /&gt;
 &lt;a href=&quot;http://computationalbiologynews.blogspot.com/&quot; title=&quot;http://computationalbiologynews.blogspot.com/&quot;&gt;http://computationalbiologynews.blogspot.com/&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Sat, 26 May 2007 01:25:17 -0400</pubDate>
 <dc:creator>Animesh</dc:creator>
 <guid isPermaLink="false">comment 3611 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Taverna</title>
 <link>http://www.nodalpoint.org/2007/03/18/a_pipeline_is_a_makefile#comment-3459</link>
 <description>&lt;p&gt;I don&#039;t get it. The diagram is too complex? Of course it is complex. But everything is pointing towards a workflow (SOA) world. Why does bioinformatics deny this turn? Oracle, SAP, IBM and so on all rewrite all their applications so they are able to use BPEL as graphical buil environment for integration processess. So stick in the 90s and keep using scripts, but within a few years we all will use workflows in a SOA world. I really believe this will happen, IBM believes in it so does the whole IT world. Let&#039;s see who&#039;s right.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 11 Apr 2007 16:35:06 -0400</pubDate>
 <dc:creator>mart1nus</dc:creator>
 <guid isPermaLink="false">comment 3459 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Re: new blast interface</title>
 <link>http://www.nodalpoint.org/2007/04/02/new_blast_design_to_bereleased_on_april_16#comment-3451</link>
 <description>&lt;p&gt;Finally!!&lt;br /&gt;
I&#039;ve never liked the actual interface ;)&lt;br /&gt;
In particular, I hated the pop-up window to show the results and I think the interface of the actual blast is not too clear.&lt;br /&gt;
The new feature to keep track of the recent results sounds cool.&lt;br /&gt;
Thanks ;)&lt;/p&gt;
&lt;p&gt;---&lt;/p&gt;
&lt;p&gt;WHO AM I?&lt;br /&gt;
Your name is , seeker.  &lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://genome.imim.es/~giovanni&quot; title=&quot;http://genome.imim.es/~giovanni&quot;&gt;http://genome.imim.es/~giovanni&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 03 Apr 2007 10:35:12 -0400</pubDate>
 <dc:creator>dalloliogm</dc:creator>
 <guid isPermaLink="false">comment 3451 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>A hack (not too kludgy)</title>
 <link>http://www.nodalpoint.org/2007/03/18/a_pipeline_is_a_makefile#comment-3439</link>
 <description>&lt;p&gt;Designate one of the multiple output as a representative file (say foo.psl), touch the other files at the end of the commands, and have the rule: &lt;code&gt;foo.log foo.whatever ...: foo.psl&lt;/code&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 28 Mar 2007 09:02:33 -0400</pubDate>
 <dc:creator>pjw</dc:creator>
 <guid isPermaLink="false">comment 3439 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Doesn&#039;t change a thing</title>
 <link>http://www.nodalpoint.org/2007/03/18/a_pipeline_is_a_makefile#comment-3436</link>
 <description>&lt;p&gt;What you wrote unfortunately is equivalent to what I wrote. It&#039;s like a shorthand notation for two separate rules, I should have explained it right away. It is going to execute the command twice. What we need is a way to express that a command has multiple outputs, and rules with multiple targets don&#039;t accomplish  that.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Fri, 23 Mar 2007 00:22:26 -0400</pubDate>
 <dc:creator>Antonio Piccolboni</dc:creator>
 <guid isPermaLink="false">comment 3436 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>makeovers</title>
 <link>http://www.nodalpoint.org/2007/03/18/a_pipeline_is_a_makefile#comment-3435</link>
 <description>&lt;p&gt;We certainly would want the replacement to look almost (if not exactly) like make. It should definitely be as simple as make -- that&#039;s probably the primary design goal: that you can cut and paste from the command line to a pipeline description file, with minimal extra typing. I&#039;m fully aware of the dangers of excessive re-engineering...&lt;/p&gt;
&lt;p&gt;Thanks for the other links... we&#039;ll certainly look into them, and post discussion summaries on biowiki.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://biowiki.org/IanHolmes&quot; title=&quot;http://biowiki.org/IanHolmes&quot;&gt;http://biowiki.org/IanHolmes&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 21 Mar 2007 12:26:55 -0400</pubDate>
 <dc:creator>Ian Holmes</dc:creator>
 <guid isPermaLink="false">comment 3435 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Why not scripting?</title>
 <link>http://www.nodalpoint.org/2007/03/18/a_pipeline_is_a_makefile#comment-3434</link>
 <description>&lt;p&gt;Scripting is great, a powerful tool that lets you achieve world peace in three lines of PERL/Python/Ruby. But what if people don&#039;t want to hack scripts? According to Grady Booch, the history of software engineering is one of increasing levels of abstraction, which is where Taverna and workflows are trying to go. Admittedly, we&#039;re not quite there yet, sometimes the &lt;a href=&quot;http://www.joelonsoftware.com/articles/LeakyAbstractions.html&quot; title=&quot;The law of leaky abstractions&quot;&gt;abstractions leak&lt;/a&gt;, and as Stew says, you end up &lt;a href=&quot;http://www.ghastlyfop.com/blog/2005/12/workflows-grid-services.html&quot;&gt;hacking BeanShell&lt;/a&gt;, thats not really a problem with Taverna or workflows, its an inherent problem in bioinformatics data, a flat-file legacy nightmare, that means we&#039;ll be forced to hack scripting languages for a long time to come, whether we like it or not.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 21 Mar 2007 04:20:40 -0400</pubDate>
 <dc:creator>Duncan</dc:creator>
 <guid isPermaLink="false">comment 3434 at http://www.nodalpoint.org</guid>
</item>
</channel>
</rss>
