<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.nodalpoint.org" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>nodalpoint.org - Redundancy reduction of sequence sets - Comments</title>
 <link>http://www.nodalpoint.org/2007/10/02/redundancy_reduction_of_sequence_sets</link>
 <description>Comments for &quot;Redundancy reduction of sequence sets&quot;</description>
 <language>en</language>
<item>
 <title>You&#039;re welcome</title>
 <link>http://www.nodalpoint.org/2007/10/02/redundancy_reduction_of_sequence_sets#comment-4203</link>
 <description>&lt;p&gt;You&#039;re welcome and apologies for omitting &quot;-c 0.9&quot; in my first comment.  I&#039;m sure you worked it out.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Thu, 04 Oct 2007 03:44:26 -0400</pubDate>
 <dc:creator>Neil</dc:creator>
 <guid isPermaLink="false">comment 4203 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Thanks !</title>
 <link>http://www.nodalpoint.org/2007/10/02/redundancy_reduction_of_sequence_sets#comment-4202</link>
 <description>&lt;p&gt;Thank you Pawel and Neil.&lt;/p&gt;
&lt;p&gt;CD-HIT is exactly what I want, and it works great. The algorithm appears to take some smart short-cuts and reduces calculation time from would have taken many hours with my half written brute-force all-against-all pairwise alignment script down to seconds. If CD-HIT is good enough for UniProt, it&#039;s good enough for me :)&lt;/p&gt;
&lt;p&gt;Blastclust looks like another good option (it was even already installed on my machine ... right under my nose), but is orders of magnitude slower, so I&#039;m sticking with CD-HIT for my quick-n-dirty tasks.&lt;/p&gt;
&lt;p&gt;Lucky I didn&#039;t go very far coding something to do this myself (although thinking about the problem was enlightening). I&#039;ve got to remember to listen to my internal &quot;stop-immediately-you-are-reinventing-the-wheel&quot; alarm more often.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 03 Oct 2007 22:37:17 -0400</pubDate>
 <dc:creator>pansapiens</dc:creator>
 <guid isPermaLink="false">comment 4202 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>I second cd-hit</title>
 <link>http://www.nodalpoint.org/2007/10/02/redundancy_reduction_of_sequence_sets#comment-4201</link>
 <description>&lt;p&gt;&lt;a href=&quot;http://bioinformatics.org/cd-hit/&quot;&gt;CD-HIT&lt;/a&gt; works for me.  Simple as &quot;cd-hit -i file -o file90 -c 0.9 -n 5&quot;.&lt;/p&gt;
&lt;p&gt;Sorry about your comments Pawel, they went to spam for some reason.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 03 Oct 2007 07:10:33 -0400</pubDate>
 <dc:creator>Neil</dc:creator>
 <guid isPermaLink="false">comment 4201 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>cd-hit, blastclust</title>
 <link>http://www.nodalpoint.org/2007/10/02/redundancy_reduction_of_sequence_sets#comment-4200</link>
 <description>&lt;p&gt;I&#039;m not sure if my previous comment was saved... Anyway, I will point again to cd-hit and blastclust. However, the first does not make any alignments, and allows some redundancy in the set anyway.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 03 Oct 2007 06:43:08 -0400</pubDate>
 <dc:creator>pawel</dc:creator>
 <guid isPermaLink="false">comment 4200 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Redundancy reduction of sequence sets</title>
 <link>http://www.nodalpoint.org/2007/10/02/redundancy_reduction_of_sequence_sets</link>
 <description>&lt;p&gt;Does anyone have a good (free, open source) software solution to reduce the redundancy of a set of sequences  (eg return a set where no two sequences are more than 90 % identical, based on a pairwise alignment) ?&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;&lt;p&gt;&lt;a href=&quot;http://www.nodalpoint.org/2007/10/02/redundancy_reduction_of_sequence_sets&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://www.nodalpoint.org/2007/10/02/redundancy_reduction_of_sequence_sets#comments</comments>
 <category domain="http://www.nodalpoint.org/forums/discussion/bioinformatics_0">Bioinformatics</category>
 <category domain="http://www.nodalpoint.org/nodalpoint_tags/bioinformatics">bioinformatics</category>
 <category domain="http://www.nodalpoint.org/nodalpoint_tags/software">software</category>
 <pubDate>Tue, 02 Oct 2007 20:37:50 -0400</pubDate>
 <dc:creator>pansapiens</dc:creator>
 <guid isPermaLink="false">2296 at http://www.nodalpoint.org</guid>
</item>
</channel>
</rss>
