I was looking for the best way to explain what is blast to people with no background in biology/bioinformatics.
I thought I could to say it's a search engine:
- Blast is the same as google, but for biological sequences instead of search terms.

I was looking for the best way to explain what is blast to people with no background in biology/bioinformatics.
I thought I could to say it's a search engine:
- Blast is the same as google, but for biological sequences instead of search terms.

I knew that sequencers are getting cheaper all the time but in genome technology this week they're talking with the inventors of the technology that 454 is licencing, discussing future pyro-sequencing updates and how that should lead to very cheap machines. Prospects are that any lab can sequence it's own genome in three years, the technology seems almost ready: Basically a cheaper and smaller version of 454's current machines. If you believe that sequence databases are exploding at the moment, better prepare for a new wave.
I've recently been to a cool workshop called "RegCreative". The idea was to mass-curate papers into a new database. There we have the usual discussion "open" (Oreganno) versus "private" (Transfac) databases and the open one is this case is still far from big enough, but that's not my main point here.
I liked the workshop because we were actually spending a lot of time at the computer and reading papers. There were no big stars, impressive results, great publications and hypothesizes, mainly people that presented their own databases ("I've spent 500 hours to create my database" (flytf), "I read 120 papers) (flyreg), etc...) and then afterwards everyone would get back to their computers, trying to put in one of the papers from the big pile at the entrace. The problem of database curation became very obvious to all participants and they got more tired of reading papers with every day that passed... (Here is a picture from the beginning, when people were still discussing :-)
Given that the backbone of sequence analysis and as such bioinformatics is alignment, this news has the potential to shake some ground in the community: There is a new company claiming that their new BLAST is as sensitive but 10.000 times faster than the original by improving the seed searching phase. I'm very sceptical... local alignments have been worked on for decades without a lot of speed improvement while keeping accuracy. Which is why there are special hardware solutions for this problem (I wonder who is buying them).
But the company's website does not look like a joke and they're shipping demo versions... I have no clue how that could work. Suffix arrays? But that has been applied to local alignment seed search, right? Anyone out there with rumors about this company ?
A new paper by Raghava and Barton has just gone online "Quantification of the variation in percentage identity for protein sequence alignments" at BMC Bioinformatics.
Initially I was shocked .. how, in 2006, could anyone manage to publish anything original about percentage identity (PID), that simple but oft used/abused measure that is fundamental in the definition of the "twilight-zone" of sequence similarity (for infering structural similarity or relatedness by sequence alone).
Well, it turns out (and becomes obvious when you try to code it), that there is more than one way to calculate the PID of a multiple sequence alignment, and each method yields different results. Authors rarely state exactly which method they used and, not surprisingly, no matter how you chose to measure the PID the multiple alignment algorithm used also has a substantial impact.
Based on the comparison of different sequencing strategies in six small marine microbial genome, the paper evaluated the utility and cost-effectiveness of a hybrid sequencing approach using 3730xl Sanger sequecing and 454 run to generate higher-quality lower-quality lower-cost assemblies compared to current Sanger sequencing strategies alone. For the genome more than 3Mb with many sequencing gaps and hard stops, the sequence strategy of 5.3X Sanger sequencing plus two 454 runs is the best choice.
Proc Natl Acad Sci U S A. 2006 Jul 13; [Epub ahead of print] Books, LinkOut
A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes.
Way back in Nodalpoint history (probably about a year ago), we had "paper of the month" posts. They were rarely monthly and often involved more than one paper, but I always thought that it was a nice idea. So here's a few that caught my eye this month.

Grid Computing already plays an important role in the life sciences, and will probably continue doing so for the forseeable future. BioGrid (Japan), myGrid (UK) and CoreGrid (Europe) are just three current examples, there are many more Grid and Super Duper Computer projects in the life sciences. So, is there an accessible Hitch Hikers Guide to the Grid for newbies, especially bioinformaticians?
This is an announcement of a bioinformatics tool, published as freeware for the community.
Geneious is an easy-to-use, cross-platform (Windows, OS X, Unix) bioinformatics data analysis and visualization tool. It has an open API for writing plugins. You can use Geneious to compare genes from different species, to build an evolutionary tree to see how closely related they are, or to search for literature on any topic in medicine or biology. You can view and extract gene annotations from whole genomes, and interactive 3D graphics allow you to move around protein structures.
Version 1.0 of Geneious has just been published as freeware. Biomatters hopes that you will put the program to good use in your research, and we are eager to hear your comments and feedback.
There are indeed quite a few multiple alignment algorithms. Wallace et al counted around 50. They started drawing trees of alignment algorithms.
Has anyone collected a list of all papers that developed a new alignment algorithm (each one, of course, better than a couple of the others)? I personally would bet that - given that the number of algorithms raise with algorithmical simplicity and the interpretability of the results - motif discovery on DNA is one of the disciplines that generated the most different papers about a new algorithm (counted around 80). Fortunately, the decision is much simpler as the savvy bio-computerfreak knows: With so much choice, the first program that gently compiles after "make" has a good chance of getting used in the end. Is this the reason why everyone is using BLAST today? Or was that a completely different time?