↓ Skip to main content

SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing

Overview of attention for article published in BMC Genomics, November 2016
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • Good Attention Score compared to outputs of the same age (73rd percentile)
  • Good Attention Score compared to outputs of the same age and source (79th percentile)

Mentioned by

twitter
5 X users
patent
1 patent

Citations

dimensions_citation
50 Dimensions

Readers on

mendeley
105 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing
Published in
BMC Genomics, November 2016
DOI 10.1186/s12864-016-3281-2
Pubmed ID
Authors

Jean-François Spinella, Pamela Mehanna, Ramon Vidal, Virginie Saillour, Pauline Cassart, Chantal Richer, Manon Ouimet, Jasmine Healy, Daniel Sinnett

Abstract

Next-generation sequencing (NGS) allows unbiased, in-depth interrogation of cancer genomes. Many somatic variant callers have been developed yet accurate ascertainment of somatic variants remains a considerable challenge as evidenced by the varying mutation call rates and low concordance among callers. Statistical model-based algorithms that are currently available perform well under ideal scenarios, such as high sequencing depth, homogeneous tumor samples, high somatic variant allele frequency (VAF), but show limited performance with sub-optimal data such as low-pass whole-exome/genome sequencing data. While the goal of any cancer sequencing project is to identify a relevant, and limited, set of somatic variants for further sequence/functional validation, the inherently complex nature of cancer genomes combined with technical issues directly related to sequencing and alignment can affect either the specificity and/or sensitivity of most callers. For these reasons, we developed SNooPer, a versatile machine learning approach that uses Random Forest classification models to accurately call somatic variants in low-depth sequencing data. SNooPer uses a subset of variant positions from the sequencing output for which the class, true variation or sequencing error, is known to train the data-specific model. Here, using a real dataset of 40 childhood acute lymphoblastic leukemia patients, we show how the SNooPer algorithm is not affected by low coverage or low VAFs, and can be used to reduce overall sequencing costs while maintaining high specificity and sensitivity to somatic variant calling. When compared to three benchmarked somatic callers, SNooPer demonstrated the best overall performance. While the goal of any cancer sequencing project is to identify a relevant, and limited, set of somatic variants for further sequence/functional validation, the inherently complex nature of cancer genomes combined with technical issues directly related to sequencing and alignment can affect either the specificity and/or sensitivity of most callers. The flexibility of SNooPer's random forest protects against technical bias and systematic errors, and is appealing in that it does not rely on user-defined parameters. The code and user guide can be downloaded at https://sourceforge.net/projects/snooper/ .

X Demographics

X Demographics

The data shown below were collected from the profiles of 5 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 105 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Netherlands 1 <1%
France 1 <1%
Unknown 103 98%

Demographic breakdown

Readers by professional status Count As %
Researcher 19 18%
Student > Master 17 16%
Student > Ph. D. Student 16 15%
Student > Bachelor 12 11%
Student > Doctoral Student 9 9%
Other 15 14%
Unknown 17 16%
Readers by discipline Count As %
Biochemistry, Genetics and Molecular Biology 28 27%
Agricultural and Biological Sciences 20 19%
Computer Science 12 11%
Engineering 5 5%
Medicine and Dentistry 4 4%
Other 12 11%
Unknown 24 23%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 6. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 27 June 2019.
All research outputs
#5,441,264
of 22,901,818 outputs
Outputs from BMC Genomics
#2,152
of 10,674 outputs
Outputs of similar age
#80,996
of 307,484 outputs
Outputs of similar age from BMC Genomics
#46
of 225 outputs
Altmetric has tracked 22,901,818 research outputs across all sources so far. Compared to these this one has done well and is in the 76th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 10,674 research outputs from this source. They receive a mean Attention Score of 4.7. This one has done well, scoring higher than 79% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 307,484 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 73% of its contemporaries.
We're also able to compare this research output to 225 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 79% of its contemporaries.