↓ Skip to main content

SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing

Overview of attention for article published in BMC Genomics, November 2016
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • Good Attention Score compared to outputs of the same age (73rd percentile)
  • High Attention Score compared to outputs of the same age and source (81st percentile)

Mentioned by

twitter
5 tweeters
patent
1 patent

Citations

dimensions_citation
38 Dimensions

Readers on

mendeley
93 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing
Published in
BMC Genomics, November 2016
DOI 10.1186/s12864-016-3281-2
Pubmed ID
Authors

Jean-François Spinella, Pamela Mehanna, Ramon Vidal, Virginie Saillour, Pauline Cassart, Chantal Richer, Manon Ouimet, Jasmine Healy, Daniel Sinnett

Abstract

Next-generation sequencing (NGS) allows unbiased, in-depth interrogation of cancer genomes. Many somatic variant callers have been developed yet accurate ascertainment of somatic variants remains a considerable challenge as evidenced by the varying mutation call rates and low concordance among callers. Statistical model-based algorithms that are currently available perform well under ideal scenarios, such as high sequencing depth, homogeneous tumor samples, high somatic variant allele frequency (VAF), but show limited performance with sub-optimal data such as low-pass whole-exome/genome sequencing data. While the goal of any cancer sequencing project is to identify a relevant, and limited, set of somatic variants for further sequence/functional validation, the inherently complex nature of cancer genomes combined with technical issues directly related to sequencing and alignment can affect either the specificity and/or sensitivity of most callers. For these reasons, we developed SNooPer, a versatile machine learning approach that uses Random Forest classification models to accurately call somatic variants in low-depth sequencing data. SNooPer uses a subset of variant positions from the sequencing output for which the class, true variation or sequencing error, is known to train the data-specific model. Here, using a real dataset of 40 childhood acute lymphoblastic leukemia patients, we show how the SNooPer algorithm is not affected by low coverage or low VAFs, and can be used to reduce overall sequencing costs while maintaining high specificity and sensitivity to somatic variant calling. When compared to three benchmarked somatic callers, SNooPer demonstrated the best overall performance. While the goal of any cancer sequencing project is to identify a relevant, and limited, set of somatic variants for further sequence/functional validation, the inherently complex nature of cancer genomes combined with technical issues directly related to sequencing and alignment can affect either the specificity and/or sensitivity of most callers. The flexibility of SNooPer's random forest protects against technical bias and systematic errors, and is appealing in that it does not rely on user-defined parameters. The code and user guide can be downloaded at https://sourceforge.net/projects/snooper/ .

Twitter Demographics

The data shown below were collected from the profiles of 5 tweeters who shared this research output. Click here to find out more about how the information was compiled.

Mendeley readers

The data shown below were compiled from readership statistics for 93 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Netherlands 1 1%
France 1 1%
Unknown 91 98%

Demographic breakdown

Readers by professional status Count As %
Researcher 19 20%
Student > Master 15 16%
Student > Ph. D. Student 14 15%
Student > Bachelor 10 11%
Student > Doctoral Student 9 10%
Other 15 16%
Unknown 11 12%
Readers by discipline Count As %
Biochemistry, Genetics and Molecular Biology 24 26%
Agricultural and Biological Sciences 20 22%
Computer Science 13 14%
Engineering 4 4%
Medicine and Dentistry 3 3%
Other 11 12%
Unknown 18 19%

Attention Score in Context

This research output has an Altmetric Attention Score of 6. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 27 June 2019.
All research outputs
#4,298,101
of 18,005,056 outputs
Outputs from BMC Genomics
#1,848
of 9,506 outputs
Outputs of similar age
#79,767
of 298,181 outputs
Outputs of similar age from BMC Genomics
#161
of 875 outputs
Altmetric has tracked 18,005,056 research outputs across all sources so far. Compared to these this one has done well and is in the 76th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 9,506 research outputs from this source. They receive a mean Attention Score of 4.4. This one has done well, scoring higher than 80% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 298,181 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 73% of its contemporaries.
We're also able to compare this research output to 875 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 81% of its contemporaries.