↓ Skip to main content

MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs

Overview of attention for article published in BMC Bioinformatics, October 2017
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (89th percentile)
  • High Attention Score compared to outputs of the same age and source (92nd percentile)

Mentioned by

blogs
1 blog
twitter
22 X users
facebook
1 Facebook page

Citations

dimensions_citation
13 Dimensions

Readers on

mendeley
73 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs
Published in
BMC Bioinformatics, October 2017
DOI 10.1186/s12859-017-1825-3
Pubmed ID
Authors

Dinghua Li, Yukun Huang, Chi-Ming Leung, Ruibang Luo, Hing-Fung Ting, Tak-Wah Lam

Abstract

The recent release of the gene-targeted metagenomics assembler Xander has demonstrated that using the trained Hidden Markov Model (HMM) to guide the traversal of de Bruijn graph gives obvious advantage over other assembly methods. Xander, as a pilot study, indeed has a lot of room for improvement. Apart from its slow speed, Xander uses only 1 k-mer size for graph construction and whatever choice of k will compromise either sensitivity or accuracy. Xander uses a Bloom-filter representation of de Bruijn graph to achieve a lower memory footprint. Bloom filters bring in false positives, and it is not clear how this would impact the quality of assembly. Xander does not keep track of the multiplicity of k-mers, which would have been an effective way to differentiate between erroneous k-mers and correct k-mers. In this paper, we present a new gene-targeted assembler MegaGTA, which attempts to improve Xander in different aspects. Quality-wise, it utilizes iterative de Bruijn graphs to take full advantage of multiple k-mer sizes to make the best of both sensitivity and accuracy. Computation-wise, it employs succinct de Bruijn graphs (SdBG) to achieve low memory footprint and high speed (the latter is benefited from a highly efficient parallel algorithm for constructing SdBG). Unlike Bloom filters, an SdBG is an exact representation of a de Bruijn graph. It enables MegaGTA to avoid false-positive contigs and to easily incorporate the multiplicity of k-mers for building better HMM model. We have compared MegaGTA and Xander on an HMP-defined mock metagenomic dataset, and showed that MegaGTA excelled in both sensitivity and accuracy. On a large rhizosphere soil metagenomic sample (327Gbp), MegaGTA produced 9.7-19.3% more contigs than Xander, and these contigs were assigned to 10-25% more gene references. In our experiments, MegaGTA, depending on the number of k-mers used, is two to ten times faster than Xander. MegaGTA improves on the algorithm of Xander and achieves higher sensitivity, accuracy and speed. Moreover, it is capable of assembling gene sequences from ultra-large metagenomic datasets. Its source code is freely available at https://github.com/HKU-BAL/megagta .

X Demographics

X Demographics

The data shown below were collected from the profiles of 22 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 73 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 73 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 22 30%
Student > Ph. D. Student 15 21%
Student > Bachelor 6 8%
Professor > Associate Professor 4 5%
Student > Doctoral Student 3 4%
Other 10 14%
Unknown 13 18%
Readers by discipline Count As %
Agricultural and Biological Sciences 18 25%
Biochemistry, Genetics and Molecular Biology 14 19%
Computer Science 5 7%
Engineering 5 7%
Environmental Science 3 4%
Other 13 18%
Unknown 15 21%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 19. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 10 February 2018.
All research outputs
#1,804,805
of 24,336,902 outputs
Outputs from BMC Bioinformatics
#377
of 7,517 outputs
Outputs of similar age
#36,185
of 329,831 outputs
Outputs of similar age from BMC Bioinformatics
#10
of 122 outputs
Altmetric has tracked 24,336,902 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 92nd percentile: it's in the top 10% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 7,517 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one has done particularly well, scoring higher than 94% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 329,831 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 89% of its contemporaries.
We're also able to compare this research output to 122 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 92% of its contemporaries.