↓ Skip to main content

Comparison of methods for estimating the nucleotide substitution matrix

Overview of attention for article published in BMC Bioinformatics, December 2008
Altmetric Badge

Citations

dimensions_citation
9 Dimensions

Readers on

mendeley
53 Mendeley
citeulike
5 CiteULike
connotea
2 Connotea
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Comparison of methods for estimating the nucleotide substitution matrix
Published in
BMC Bioinformatics, December 2008
DOI 10.1186/1471-2105-9-511
Pubmed ID
Authors

Maribeth Oscamou, Daniel McDonald, Von Bing Yap, Gavin A Huttley, Manuel E Lladser, Rob Knight

Abstract

The nucleotide substitution rate matrix is a key parameter of molecular evolution. Several methods for inferring this parameter have been proposed, with different mathematical bases. These methods include counting sequence differences and taking the log of the resulting probability matrices, methods based on Markov triples, and maximum likelihood methods that infer the substitution probabilities that lead to the most likely model of evolution. However, the speed and accuracy of these methods has not been compared. Different methods differ in performance by orders of magnitude (ranging from 1 ms to 10 s per matrix), but differences in accuracy of rate matrix reconstruction appear to be relatively small. Encouragingly, relatively simple and fast methods can provide results at least as accurate as far more complex and computationally intensive methods, especially when the sequences to be compared are relatively short. Based on the conditions tested, we recommend the use of method of Gojobori et al. (1982) for long sequences (> 600 nucleotides), and the method of Goldman et al. (1996) for shorter sequences (< 600 nucleotides). The method of Barry and Hartigan (1987) can provide somewhat more accuracy, measured as the Euclidean distance between the true and inferred matrices, on long sequences (> 2000 nucleotides) at the expense of substantially longer computation time. The availability of methods that are both fast and accurate will allow us to gain a global picture of change in the nucleotide substitution rate matrix on a genomewide scale across the tree of life.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 53 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 3 6%
Australia 2 4%
Germany 1 2%
India 1 2%
Sweden 1 2%
Denmark 1 2%
United Kingdom 1 2%
Unknown 43 81%

Demographic breakdown

Readers by professional status Count As %
Researcher 15 28%
Student > Ph. D. Student 11 21%
Professor 5 9%
Student > Master 5 9%
Student > Bachelor 4 8%
Other 11 21%
Unknown 2 4%
Readers by discipline Count As %
Agricultural and Biological Sciences 25 47%
Computer Science 8 15%
Biochemistry, Genetics and Molecular Biology 7 13%
Environmental Science 2 4%
Mathematics 2 4%
Other 6 11%
Unknown 3 6%