Report for: A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis
Published in	Human Genomics, July 2016
DOI	10.1186/s40246-016-0068-0
Pubmed ID	27461106
Authors	Isaac Akogwu, Nan Wang, Chaoyang Zhang, Ping Gong
Abstract	Innumerable opportunities for new genomic research have been stimulated by advancement in high-throughput next-generation sequencing (NGS). However, the pitfall of NGS data abundance is the complication of distinction between true biological variants and sequence error alterations during downstream analysis. Many error correction methods have been developed to correct erroneous NGS reads before further analysis, but independent evaluation of the impact of such dataset features as read length, genome size, and coverage depth on their performance is lacking. This comparative study aims to investigate the strength and weakness as well as limitations of some newest k-spectrum-based methods and to provide recommendations for users in selecting suitable methods with respect to specific NGS datasets. Six k-spectrum-based methods, i.e., Reptile, Musket, Bless, Bloocoo, Lighter, and Trowel, were compared using six simulated sets of paired-end Illumina sequencing data. These NGS datasets varied in coverage depth (10× to 120×), read length (36 to 100 bp), and genome size (4.6 to 143 MB). Error Correction Evaluation Toolkit (ECET) was employed to derive a suite of metrics (i.e., true positives, false positive, false negative, recall, precision, gain, and F-score) for assessing the correction quality of each method. Results from computational experiments indicate that Musket had the best overall performance across the spectra of examined variants reflected in the six datasets. The lowest accuracy of Musket (F-score = 0.81) occurred to a dataset with a medium read length (56 bp), a medium coverage (50×), and a small-sized genome (5.4 MB). The other five methods underperformed (F-score < 0.80) and/or failed to process one or more datasets. This study demonstrates that various factors such as coverage depth, read length, and genome size may influence performance of individual k-spectrum-based error correction methods. Thus, efforts have to be paid in choosing appropriate methods for error correction of specific NGS datasets. Based on our comparative study, we recommend Musket as the top choice because of its consistently superior performance across all six testing datasets. Further extensive studies are warranted to assess these methods using experimental datasets generated by NGS platforms (e.g., 454, SOLiD, and Ion Torrent) under more diversified parameter settings (k-mer values and edit distances) and to compare them against other non-k-spectrum-based classes of error correction methods.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 7 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United Kingdom	3	43%
Unknown	4	57%

Demographic breakdown

Type	Count	As %
Members of the public	4	57%
Scientists	3	43%

Mendeley readers

The data shown below were compiled from readership statistics for 52 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
United States	1	2%
Netherlands	1	2%
Sweden	1	2%
France	1	2%
Unknown	48	92%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	14	27%
Student > Ph. D. Student	9	17%
Student > Bachelor	7	13%
Student > Master	7	13%
Professor > Associate Professor	3	6%
Other	7	13%
Unknown	5	10%

Readers by discipline	Count	As %
Biochemistry, Genetics and Molecular Biology	13	25%
Agricultural and Biological Sciences	12	23%
Computer Science	8	15%
Medicine and Dentistry	2	4%
Engineering	2	4%
Other	7	13%
Unknown	8	15%

Attention Score in Context

This research output has an Altmetric Attention Score of 3. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 01 August 2016.

All research outputs

#8,534,528

of 25,373,627 outputs

Outputs from Human Genomics

#211

of 564 outputs

Outputs of similar age

#136,977

of 379,928 outputs

Outputs of similar age from Human Genomics

of 12 outputs

Altmetric has tracked 25,373,627 research outputs across all sources so far. This one is in the 43rd percentile – i.e., 43% of other outputs scored the same or lower than it.

So far Altmetric has tracked 564 research outputs from this source. They typically receive more attention than average, with a mean Attention Score of 7.6. This one has gotten more attention than average, scoring higher than 53% of its peers.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 379,928 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 47th percentile – i.e., 47% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 12 others from the same source and published within six weeks on either side of this one. This one is in the 41st percentile – i.e., 41% of its contemporaries scored the same or lower than it.

A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context