↓ Skip to main content

Taxonomy-aware feature engineering for microbiome classification

Overview of attention for article published in BMC Bioinformatics, June 2018
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (86th percentile)
  • High Attention Score compared to outputs of the same age and source (92nd percentile)

Mentioned by

blogs
1 blog
twitter
12 X users
patent
1 patent

Readers on

mendeley
111 Mendeley
citeulike
1 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Taxonomy-aware feature engineering for microbiome classification
Published in
BMC Bioinformatics, June 2018
DOI 10.1186/s12859-018-2205-3
Pubmed ID
Authors

Mai Oudah, Andreas Henschel

Abstract

What is a healthy microbiome? The pursuit of this and many related questions, especially in light of the recently recognized microbial component in a wide range of diseases has sparked a surge in metagenomic studies. They are often not simply attributable to a single pathogen but rather are the result of complex ecological processes. Relatedly, the increasing DNA sequencing depth and number of samples in metagenomic case-control studies enabled the applicability of powerful statistical methods, e.g. Machine Learning approaches. For the latter, the feature space is typically shaped by the relative abundances of operational taxonomic units, as determined by cost-effective phylogenetic marker gene profiles. While a substantial body of microbiome/microbiota research involves unsupervised and supervised Machine Learning, very little attention has been put on feature selection and engineering. We here propose the first algorithm to exploit phylogenetic hierarchy (i.e. an all-encompassing taxonomy) in feature engineering for microbiota classification. The rationale is to exploit the often mono- or oligophyletic distribution of relevant (but hidden) traits by virtue of taxonomic abstraction. The algorithm is embedded in a comprehensive microbiota classification pipeline, which we applied to a diverse range of datasets, distinguishing healthy from diseased microbiota samples. We demonstrate substantial improvements over the state-of-the-art microbiota classification tools in terms of classification accuracy, regardless of the actual Machine Learning technique while using drastically reduced feature spaces. Moreover, generalized features bear great explanatory value: they provide a concise description of conditions and thus help to provide pathophysiological insights. Indeed, the automatically and reproducibly derived features are consistent with previously published domain expert analyses.

X Demographics

X Demographics

The data shown below were collected from the profiles of 12 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 111 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 111 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 24 22%
Student > Ph. D. Student 16 14%
Student > Master 14 13%
Student > Doctoral Student 9 8%
Student > Bachelor 8 7%
Other 17 15%
Unknown 23 21%
Readers by discipline Count As %
Biochemistry, Genetics and Molecular Biology 25 23%
Agricultural and Biological Sciences 21 19%
Computer Science 13 12%
Engineering 5 5%
Immunology and Microbiology 3 3%
Other 19 17%
Unknown 25 23%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 16. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 24 June 2021.
All research outputs
#2,252,360
of 24,885,505 outputs
Outputs from BMC Bioinformatics
#551
of 7,601 outputs
Outputs of similar age
#45,746
of 334,780 outputs
Outputs of similar age from BMC Bioinformatics
#8
of 99 outputs
Altmetric has tracked 24,885,505 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 90th percentile: it's in the top 10% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 7,601 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one has done particularly well, scoring higher than 92% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 334,780 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 86% of its contemporaries.
We're also able to compare this research output to 99 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 92% of its contemporaries.