Yep, MAFFT is fast and has excellent alignment quality. My last year's paper showed that it is the best for prtein alignment, even for distant sequences.
Apart from CD-HIT which is a very good software, I would try using Phylip. The sequences are not large but the actual number of sequences is the problem. Try using a Neighbor Joining approach in Phylip, it won't be blazing fast but it will do the job, eventually.
I used Phylip to build a NJ tree of a set of 20000 protein sequences and it took me around 3-4 weeks to get it done on a 3GHz Xeon machine.
For clustering, CD-HIT is excellent. Very fast, handles many sequences. Used to create the non-redundant datasets in UniProt and at the PDB.
Phylogeny - I've never gone much beyond Clustal and Phylip, both of which would take hours on an average machine with any more than a few thousand sequences. I've heard good things about MrBayes - which is MPI-enabled, so could run on a cluster if you have access to one.
quicktree
I'd use MAFFT for aligning something this big.
quicktree was designed for building trees from large datasets like this (Pfam families).
MAFFT
Yep, MAFFT is fast and has excellent alignment quality. My last year's paper showed that it is the best for prtein alignment, even for distant sequences.
Phylip
Apart from CD-HIT which is a very good software, I would try using Phylip. The sequences are not large but the actual number of sequences is the problem. Try using a Neighbor Joining approach in Phylip, it won't be blazing fast but it will do the job, eventually.
I used Phylip to build a NJ tree of a set of 20000 protein sequences and it took me around 3-4 weeks to get it done on a 3GHz Xeon machine.
cd-hit
For clustering, CD-HIT is excellent. Very fast, handles many sequences. Used to create the non-redundant datasets in UniProt and at the PDB.
Phylogeny - I've never gone much beyond Clustal and Phylip, both of which would take hours on an average machine with any more than a few thousand sequences. I've heard good things about MrBayes - which is MPI-enabled, so could run on a cluster if you have access to one.
Huge!
Alf that is a huge requirement! Are you trying to make tree of life with 1000 base upstream of some house-keeping genes?
I think MUSCLE [ http://www.drive5.com/muscle/ ] can come to rescue, but you would need a good machine for sure. It uses log-expectation as profile function which is faster and accurate as well [ http://www.biomedcentral.com/1471-2105/5/113/table/T2 ]. General algorithm is http://nar.oxfordjournals.org/content/vol32/issue5/images/large/gkh340f2... .
More details in the paper http://nar.oxfordjournals.org/cgi/content/full/32/5/1792 .
______________________"The Answer Lies in Genome"______________________
http://computationalbiologynews.blogspot.com/