A new paper by Raghava and Barton has just gone online "Quantification of the variation in percentage identity for protein sequence alignments" at BMC Bioinformatics.
Initially I was shocked .. how, in 2006, could anyone manage to publish anything original about percentage identity (PID), that simple but oft used/abused measure that is fundamental in the definition of the "twilight-zone" of sequence similarity (for infering structural similarity or relatedness by sequence alone).
Well, it turns out (and becomes obvious when you try to code it), that there is more than one way to calculate the PID of a multiple sequence alignment, and each method yields different results. Authors rarely state exactly which method they used and, not surprisingly, no matter how you chose to measure the PID the multiple alignment algorithm used also has a substantial impact.

