Does anyone have a good (free, open source) software solution to reduce the redundancy of a set of sequences (eg return a set where no two sequences are more than 90 % identical, based on a pairwise alignment) ?
Does anyone have a good (free, open source) software solution to reduce the redundancy of a set of sequences (eg return a set where no two sequences are more than 90 % identical, based on a pairwise alignment) ?
If you had 100,000 DNA sequences, each 1000 nucleotides long, and you wanted to cluster them or create a phylogeny, which software would you use?
Also, what if they were amino acid sequences, rather than nucleotides?
April 2, 2007: New BLAST design to be released on April 16, 2007
----------------------------------------------------------------
The new NCBI BLAST pages will become the default interface at
http://ncbi.nlm.nih.gov/blast on April 16, 2007. The new
interface is currently available as a beta release at
http://ncbi.nlm.nih.gov/blast/beta/. For details on the new
interface, see http://www.ncbi.nlm.nih.gov/BLAST/beta/about/.
After the new interface is released, the previous interface will
remain available from a link on the new front page until May 14,
2007.
A Note About URLAPI
What is a pipeline? For me, it' s series of steps that munch DNA/protein data, combines it with other data using various small scripts and outputs the results as diagrams or HTML. Do we want to code this kind of software as a script? If you think "makefile!" now, then you're much more clever than I was. But personally, until recently, I've glued my scripts together using other scripts. And used makefiles only for compiling my programs. That was a bad idea. (it's a quite detailed post, click on "read more" for the full article)