Alignment Uncertainty

January 25, 2008

Today in Science, Wong et. al. has a brief three page report entitled, “Alignment Uncertainty and Genomic Analysis”. In the same issue (pg 416; doi 10.1126/science.1153156) Rokas writes a perspective on Wong’s report. [Ironically, Rokas’ perspective is probably near the same amount of text as Wong’s report since Wong includes two large figures.]The report makes a simple assertion, “methods applied to the analysis of genomic data do not account for uncertainty in the sequence alignment.” They then show, by applying seven different popular alignment programs to protein sequences from seven yeast species, that uncertainty in the alignment can lead to problems, including different alignments giving different conclusions.

Interestingly, they show that alignment variability, as reflected by the marginal posterior probability distribution of alignments (calculated by BAli-Phy), was associated with the inconsistency of alignments produced by the seven different alignment methods. This model jointly estimates the alignment and phylogeny.

Wong correctly asserts that this problem will not be resolved adequately by simply discarding information (filtering). Instead, the uncertainty needs to be factored into the analysis, traditionally by treating the parameter (in this case the alignment) as a random variable. Quoting Wong’s conculsion, “Allowing for uncertainty in the alignment and, possibly, phylogeny simultaneously, through statistical phylo-alignment, should be of special importance in comparative genomics studies.

Considering alignments as a random variable is innate to statistical alignment procedures. The classical here is the TKF91 model (Thorne, Kishino, and Felsenstein; J. Mol. Evol.) which uses and explicit birth-death model for indels and other events. Several groups have built on this work:

  1. Holmes and Bruno, Bioinformatics 17(9): 803-820 (2001)
  2. Lunter et. al. BMC Bioinformatics 6:83 (2005)
  3. Fleissner et. al. Syst. Biol. 54(4):548-561 (2005)
  4. Suchard and Redelings, Bioinformatics 22(16):2047-8 (2006)

just to name a few (see other citations from these groups). Recently, Bradley and Holmes (Bioinformatics 23: 3258-3262; doi 10.1093/bioinformatics/btm402) described a unifying probabilistic framework, transducers, for comparative sequence analysis. Statistical alignment can be forumlated as a transducer. These models are computationally expensive and will likely require constraints to be made practical.

Thorne, J.L., Kishino, H., Felsenstein, J. (1991). An evolutionary model for maximum likelihood alignment of DNA sequences. Journal of Molecular Evolution, 33(2), 114-124. DOI: 10.1007/BF02193625

Wong, K.M., Suchard, M.A., Huelsenbeck, J.P. (2008). Alignment Uncertainty and Genomic Analysis. Science, 319(5862), 473-476. DOI: 10.1126/science.1151532

3 Responses to “Alignment Uncertainty”

  1. […] into account alignment uncertainty WrightFisher talks about a paper & the commentary in Science describing how alignment uncertainty should be […]

  2. […] this post I became aware of the paper “Alignment Uncertainty and Genomic Analysis” by Wong et […]

  3. […] A recent paper on the uncertainty in aligning sequences took the interest of Blind.Scientist, Thirst for Science, and Computational Biology News. SNPs are hot at Open Helix and the Spittoon. is even […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: