↓ Skip to main content

De novo assembly of highly polymorphic metagenomic data using in situ generated reference sequences and a novel BLAST-based assembly pipeline

Overview of attention for article published in BMC Bioinformatics, April 2017
Altmetric Badge

About this Attention Score

  • Good Attention Score compared to outputs of the same age (66th percentile)
  • Good Attention Score compared to outputs of the same age and source (69th percentile)

Mentioned by

twitter
8 X users

Citations

dimensions_citation
15 Dimensions

Readers on

mendeley
44 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
De novo assembly of highly polymorphic metagenomic data using in situ generated reference sequences and a novel BLAST-based assembly pipeline
Published in
BMC Bioinformatics, April 2017
DOI 10.1186/s12859-017-1630-z
Pubmed ID
Authors

You-Yu Lin, Chia-Hung Hsieh, Jiun-Hong Chen, Xuemei Lu, Jia-Horng Kao, Pei-Jer Chen, Ding-Shinn Chen, Hurng-Yi Wang

Abstract

The accuracy of metagenomic assembly is usually compromised by high levels of polymorphism due to divergent reads from the same genomic region recognized as different loci when sequenced and assembled together. A viral quasispecies is a group of abundant and diversified genetically related viruses found in a single carrier. Current mainstream assembly methods, such as Velvet and SOAPdenovo, were not originally intended for the assembly of such metagenomics data, and therefore demands for new methods to provide accurate and informative assembly results for metagenomic data. In this study, we present a hybrid method for assembling highly polymorphic data combining the partial de novo-reference assembly (PDR) strategy and the BLAST-based assembly pipeline (BBAP). The PDR strategy generates in situ reference sequences through de novo assembly of a randomly extracted partial data set which is subsequently used for the reference assembly for the full data set. BBAP employs a greedy algorithm to assemble polymorphic reads. We used 12 hepatitis B virus quasispecies NGS data sets from a previous study to assess and compare the performance of both PDR and BBAP. Analyses suggest the high polymorphism of a full metagenomic data set leads to fragmentized de novo assembly results, whereas the biased or limited representation of external reference sequences included fewer reads into the assembly with lower assembly accuracy and variation sensitivity. In comparison, the PDR generated in situ reference sequence incorporated more reads into the final PDR assembly of the full metagenomics data set along with greater accuracy and higher variation sensitivity. BBAP assembly results also suggest higher assembly efficiency and accuracy compared to other assembly methods. Additionally, BBAP assembly recovered HBV structural variants that were not observed amongst assembly results of other methods. Together, PDR/BBAP assembly results were significantly better than other compared methods. Both PDR and BBAP independently increased the assembly efficiency and accuracy of highly polymorphic data, and assembly performances were further improved when used together. BBAP also provides nucleotide frequency information. Together, PDR and BBAP provide powerful tools for metagenomic data studies.

X Demographics

X Demographics

The data shown below were collected from the profiles of 8 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 44 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 44 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 13 30%
Student > Ph. D. Student 12 27%
Student > Bachelor 3 7%
Student > Master 3 7%
Professor > Associate Professor 2 5%
Other 4 9%
Unknown 7 16%
Readers by discipline Count As %
Agricultural and Biological Sciences 12 27%
Biochemistry, Genetics and Molecular Biology 10 23%
Immunology and Microbiology 4 9%
Medicine and Dentistry 3 7%
Nursing and Health Professions 1 2%
Other 5 11%
Unknown 9 20%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 5. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 08 June 2017.
All research outputs
#6,513,240
of 23,498,099 outputs
Outputs from BMC Bioinformatics
#2,455
of 7,400 outputs
Outputs of similar age
#101,691
of 310,973 outputs
Outputs of similar age from BMC Bioinformatics
#39
of 123 outputs
Altmetric has tracked 23,498,099 research outputs across all sources so far. This one has received more attention than most of these and is in the 72nd percentile.
So far Altmetric has tracked 7,400 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one has gotten more attention than average, scoring higher than 66% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 310,973 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 66% of its contemporaries.
We're also able to compare this research output to 123 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 69% of its contemporaries.