↓ Skip to main content

Higher recall in metagenomic sequence classification exploiting overlapping reads

Overview of attention for article published in BMC Genomics, December 2017
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (80th percentile)
  • High Attention Score compared to outputs of the same age and source (82nd percentile)

Mentioned by

twitter
16 X users

Citations

dimensions_citation
13 Dimensions

Readers on

mendeley
11 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Higher recall in metagenomic sequence classification exploiting overlapping reads
Published in
BMC Genomics, December 2017
DOI 10.1186/s12864-017-4273-6
Pubmed ID
Authors

Samuele Girotto, Matteo Comin, Cinzia Pizzi

Abstract

In recent years several different fields, such as ecology, medicine and microbiology, have experienced an unprecedented development due to the possibility of direct sequencing of microbioimic samples. Among problems that researchers in the field have to deal with, taxonomic classification of metagenomic reads is one of the most challenging. State of the art methods classify single reads with almost 100% precision. However, very often, the performance in terms of recall falls at about 50%. As a consequence, state-of-the-art methods are indeed capable of correctly classify only half of the reads in the sample. How to achieve better performances in terms of overall quality of classification remains a largely unsolved problem. In this paper we propose a method for metagenomics CLassification Improvement with Overlapping Reads (CLIOR), that exploits the information carried by the overlapping reads graph of the input read dataset to improve recall, f-measure, and the estimated abundance of species. In this work, we applied CLIOR on top of the classification produced by the classifier Clark-l. Experiments on simulated and synthetic metagenomes show that CLIOR can lead to substantial improvement of the recall rate, sometimes doubling it. On average, on simulated datasets, the increase of recall is paired with an higher precision too, while on synthetic datasets it comes at expenses of a small loss of precision. On experiments on real metagenomes CLIOR is able to assign many more reads while keeping the abundance ratios in line with previous studies. Our results showed that with CLIOR is possible to boost the recall of a state-of-the-art metagenomic classifier by inferring and/or correcting the assignment of reads with missing or erroneous labeling. CLIOR is not restricted to the reads classification algorithm used in our experiments, but it may be applied to other methods too. Finally, CLIOR does not need large computational resources, and it can be run on a laptop.

X Demographics

X Demographics

The data shown below were collected from the profiles of 16 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 11 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 11 100%

Demographic breakdown

Readers by professional status Count As %
Lecturer 2 18%
Student > Doctoral Student 2 18%
Student > Master 2 18%
Student > Bachelor 1 9%
Other 1 9%
Other 2 18%
Unknown 1 9%
Readers by discipline Count As %
Agricultural and Biological Sciences 5 45%
Biochemistry, Genetics and Molecular Biology 3 27%
Computer Science 2 18%
Unknown 1 9%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 8. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 21 December 2017.
All research outputs
#3,827,192
of 23,011,300 outputs
Outputs from BMC Genomics
#1,540
of 10,697 outputs
Outputs of similar age
#82,538
of 439,982 outputs
Outputs of similar age from BMC Genomics
#40
of 228 outputs
Altmetric has tracked 23,011,300 research outputs across all sources so far. Compared to these this one has done well and is in the 82nd percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 10,697 research outputs from this source. They receive a mean Attention Score of 4.7. This one has done well, scoring higher than 85% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 439,982 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 80% of its contemporaries.
We're also able to compare this research output to 228 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 82% of its contemporaries.