↓ Skip to main content

General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?

Overview of attention for article published in BMC Bioinformatics, September 2016
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (88th percentile)
  • High Attention Score compared to outputs of the same age and source (93rd percentile)

Mentioned by

blogs
1 blog
twitter
12 X users
googleplus
1 Google+ user
video
1 YouTube creator

Citations

dimensions_citation
7 Dimensions

Readers on

mendeley
43 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?
Published in
BMC Bioinformatics, September 2016
DOI 10.1186/s12859-016-1105-7
Pubmed ID
Authors

Kiyoshi Ezawa

Abstract

Insertions and deletions (indels) account for more nucleotide differences between two related DNA sequences than substitutions do, and thus it is imperative to develop a stochastic evolutionary model that enables us to reliably calculate the probability of the sequence evolution through indel processes. Recently, indel probabilistic models are mostly based on either hidden Markov models (HMMs) or transducer theories, both of which give the indel component of the probability of a given sequence alignment as a product of either probabilities of column-to-column transitions or block-wise contributions along the alignment. However, it is not a priori clear how these models are related with any genuine stochastic evolutionary model, which describes the stochastic evolution of an entire sequence along the time-axis. Moreover, currently none of these models can fully accommodate biologically realistic features, such as overlapping indels, power-law indel-length distributions, and indel rate variation across regions. Here, we theoretically dissect the ab initio calculation of the probability of a given sequence alignment under a genuine stochastic evolutionary model, more specifically, a general continuous-time Markov model of the evolution of an entire sequence via insertions and deletions. Our model is a simple extension of the general "substitution/insertion/deletion (SID) model". Using the operator representation of indels and the technique of time-dependent perturbation theory, we express the ab initio probability as a summation over all alignment-consistent indel histories. Exploiting the equivalence relations between different indel histories, we find a "sufficient and nearly necessary" set of conditions under which the probability can be factorized into the product of an overall factor and the contributions from regions separated by gapless columns of the alignment, thus providing a sort of generalized HMM. The conditions distinguish evolutionary models with factorable alignment probabilities from those without ones. The former category includes the "long indel" model (a space-homogeneous SID model) and the model used by Dawg, a genuine sequence evolution simulator. With intuitive clarity and mathematical preciseness, our theoretical formulation will help further advance the ab initio calculation of alignment probabilities under biologically realistic models of sequence evolution via indels.

X Demographics

X Demographics

The data shown below were collected from the profiles of 12 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 43 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 2 5%
Spain 1 2%
Germany 1 2%
Unknown 39 91%

Demographic breakdown

Readers by professional status Count As %
Student > Master 10 23%
Researcher 9 21%
Student > Ph. D. Student 8 19%
Student > Doctoral Student 3 7%
Student > Bachelor 3 7%
Other 7 16%
Unknown 3 7%
Readers by discipline Count As %
Biochemistry, Genetics and Molecular Biology 15 35%
Agricultural and Biological Sciences 10 23%
Computer Science 10 23%
Psychology 1 2%
Physics and Astronomy 1 2%
Other 2 5%
Unknown 4 9%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 15. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 13 February 2017.
All research outputs
#2,164,998
of 23,881,329 outputs
Outputs from BMC Bioinformatics
#536
of 7,454 outputs
Outputs of similar age
#37,907
of 323,676 outputs
Outputs of similar age from BMC Bioinformatics
#9
of 119 outputs
Altmetric has tracked 23,881,329 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 90th percentile: it's in the top 10% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 7,454 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one has done particularly well, scoring higher than 92% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 323,676 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 88% of its contemporaries.
We're also able to compare this research output to 119 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 93% of its contemporaries.