A new paper by Raghava and Barton has just gone online "Quantification of the variation in percentage identity for protein sequence alignments" at BMC Bioinformatics.
Initially I was shocked .. how, in 2006, could anyone manage to publish anything original about percentage identity (PID), that simple but oft used/abused measure that is fundamental in the definition of the "twilight-zone" of sequence similarity (for infering structural similarity or relatedness by sequence alone).
Well, it turns out (and becomes obvious when you try to code it), that there is more than one way to calculate the PID of a multiple sequence alignment, and each method yields different results. Authors rarely state exactly which method they used and, not surprisingly, no matter how you chose to measure the PID the multiple alignment algorithm used also has a substantial impact.
The punchline is that a Z-score (determined via comparison to shuffled sequences) gives a better measure of similarity when trying to infer structural similarity from sequences alone. I personally can't see PID measures going away any time soon, since their use is entrenched and the basic concept is still quite a powerful way to communicate similarity to broader audiences (ie bench biologists).
Nonetheless, since this work comes out of the Barton group, home of Jalview, lets hope they implement their Z-score measure in that software, allowing it to get more widespread use.
Not really earth shattering findings, but good, fundemental stuff.


Comments
Z-scores eek....
There is more than one way of calculating Z-score as well, and distribution of randomized alignment values is not perfectly normal, so the question if Z-score is right measure to begin with. Don't get me wrong, PID calculations have their problems as well, local vs global alignment, definition of overlap region etc. PID values have a nice intuitive feel for any biologist, the same can't be said about Z-score.
new journal needed
There should be a new journal for this sort of thing: "Journal of non-earth shattering results", perhaps. On the other hand, authors worried about their output can gain some comfort (and start writing up some of their more obvious findings).
Remember Katoh?
I will remind you of the Katoh [ http://www.nodalpoint.org/2006/04/07/how_to_get_many_pubmed_entries_when... ]post.
By the way guys, did you read Dilbert's blog [ http://dilbertblog.typepad.com/the_dilbert_blog/2006/09/answer_to_the_p...., ]? Moist robots... are we? May be worse...
______________________"The Answer Lies in Genome"______________________
http://sharma.animesh.googlepages.com/