↓ Skip to main content

Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers

Overview of attention for article published in BMC Bioinformatics, January 2017
Altmetric Badge

About this Attention Score

  • Above-average Attention Score compared to outputs of the same age (51st percentile)
  • Average Attention Score compared to outputs of the same age and source

Mentioned by

twitter
3 X users
f1000
1 research highlight platform

Readers on

mendeley
115 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers
Published in
BMC Bioinformatics, January 2017
DOI 10.1186/s12859-016-1417-7
Pubmed ID
Authors

Ariane L. Hofmann, Jonas Behr, Jochen Singer, Jack Kuipers, Christian Beisel, Peter Schraml, Holger Moch, Niko Beerenwinkel

Abstract

Next-generation sequencing of matched tumor and normal biopsy pairs has become a technology of paramount importance for precision cancer treatment. Sequencing costs have dropped tremendously, allowing the sequencing of the whole exome of tumors for just a fraction of the total treatment costs. However, clinicians and scientists cannot take full advantage of the generated data because the accuracy of analysis pipelines is limited. This particularly concerns the reliable identification of subclonal mutations in a cancer tissue sample with very low frequencies, which may be clinically relevant. Using simulations based on kidney tumor data, we compared the performance of nine state-of-the-art variant callers, namely deepSNV, GATK HaplotypeCaller, GATK UnifiedGenotyper, JointSNVMix2, MuTect, SAMtools, SiNVICT, SomaticSniper, and VarScan2. The comparison was done as a function of variant allele frequencies and coverage. Our analysis revealed that deepSNV and JointSNVMix2 perform very well, especially in the low-frequency range. We attributed false positive and false negative calls of the nine tools to specific error sources and assigned them to processing steps of the pipeline. All of these errors can be expected to occur in real data sets. We found that modifying certain steps of the pipeline or parameters of the tools can lead to substantial improvements in performance. Furthermore, a novel integration strategy that combines the ranks of the variants yielded the best performance. More precisely, the rank-combination of deepSNV, JointSNVMix2, MuTect, SiNVICT and VarScan2 reached a sensitivity of 78% when fixing the precision at 90%, and outperformed all individual tools, where the maximum sensitivity was 71% with the same precision. The choice of well-performing tools for alignment and variant calling is crucial for the correct interpretation of exome sequencing data obtained from mixed samples, and common pipelines are suboptimal. We were able to relate observed substantial differences in performance to the underlying statistical models of the tools, and to pinpoint the error sources of false positive and false negative calls. These findings might inspire new software developments that improve exome sequencing pipelines and further the field of precision cancer treatment.

X Demographics

X Demographics

The data shown below were collected from the profiles of 3 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 115 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United Kingdom 1 <1%
Belgium 1 <1%
Unknown 113 98%

Demographic breakdown

Readers by professional status Count As %
Researcher 23 20%
Student > Ph. D. Student 22 19%
Student > Master 16 14%
Student > Bachelor 13 11%
Other 8 7%
Other 14 12%
Unknown 19 17%
Readers by discipline Count As %
Biochemistry, Genetics and Molecular Biology 32 28%
Agricultural and Biological Sciences 27 23%
Computer Science 11 10%
Medicine and Dentistry 7 6%
Engineering 7 6%
Other 14 12%
Unknown 17 15%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 3. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 09 November 2018.
All research outputs
#13,005,966
of 22,931,367 outputs
Outputs from BMC Bioinformatics
#3,805
of 7,307 outputs
Outputs of similar age
#200,782
of 421,357 outputs
Outputs of similar age from BMC Bioinformatics
#66
of 138 outputs
Altmetric has tracked 22,931,367 research outputs across all sources so far. This one is in the 42nd percentile – i.e., 42% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,307 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 45th percentile – i.e., 45% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 421,357 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 51% of its contemporaries.
We're also able to compare this research output to 138 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 50% of its contemporaries.