Title |
Analysis of composition-based metagenomic classification
|
---|---|
Published in |
BMC Genomics, October 2012
|
DOI | 10.1186/1471-2164-13-s5-s1 |
Pubmed ID | |
Authors |
Susan Higashi, André da Motta Salles Barreto, Maurício Egidio Cantão, Ana Tereza Ribeiro de Vasconcelos |
Abstract |
An essential step of a metagenomic study is the taxonomic classification, that is, the identification of the taxonomic lineage of the organisms in a given sample. The taxonomic classification process involves a series of decisions. Currently, in the context of metagenomics, such decisions are usually based on empirical studies that consider one specific type of classifier. In this study we propose a general framework for analyzing the impact that several decisions can have on the classification problem. Instead of focusing on any specific classifier, we define a generic score function that provides a measure of the difficulty of the classification task. Using this framework, we analyze the impact of the following parameters on the taxonomic classification problem: (i) the length of n-mers used to encode the metagenomic sequences, (ii) the similarity measure used to compare sequences, and (iii) the type of taxonomic classification, which can be conventional or hierarchical, depending on whether the classification process occurs in a single shot or in several steps according to the taxonomic tree. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 2 | 50% |
Canada | 1 | 25% |
Chile | 1 | 25% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 2 | 50% |
Scientists | 2 | 50% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Brazil | 3 | 7% |
Spain | 1 | 2% |
France | 1 | 2% |
United States | 1 | 2% |
Unknown | 35 | 85% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 10 | 24% |
Student > Ph. D. Student | 9 | 22% |
Student > Master | 8 | 20% |
Student > Doctoral Student | 4 | 10% |
Student > Bachelor | 4 | 10% |
Other | 4 | 10% |
Unknown | 2 | 5% |
Readers by discipline | Count | As % |
---|---|---|
Agricultural and Biological Sciences | 20 | 49% |
Computer Science | 8 | 20% |
Biochemistry, Genetics and Molecular Biology | 7 | 17% |
Environmental Science | 2 | 5% |
Unspecified | 1 | 2% |
Other | 1 | 2% |
Unknown | 2 | 5% |