↓ Skip to main content

tarSVM: Improving the accuracy of variant calls derived from microfluidic PCR-based targeted next generation sequencing using a support vector machine

Overview of attention for article published in BMC Bioinformatics, June 2016
Altmetric Badge

About this Attention Score

  • Average Attention Score compared to outputs of the same age

Mentioned by

twitter
4 X users

Citations

dimensions_citation
2 Dimensions

Readers on

mendeley
44 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
tarSVM: Improving the accuracy of variant calls derived from microfluidic PCR-based targeted next generation sequencing using a support vector machine
Published in
BMC Bioinformatics, June 2016
DOI 10.1186/s12859-016-1108-4
Pubmed ID
Authors

Christopher E. Gillies, Edgar A. Otto, Virginia Vega-Warner, Catherine C. Robertson, Simone Sanna-Cherchi, Ali Gharavi, Brendan Crawford, Rajendra Bhimma, Cheryl Winkler, Nephrotic Syndrome Study Network (NEPTUNE), C-PROBE InvestigatorGroup of the Michigan Kidney Translational Core Center, Hyun Min Kang, Matthew G. Sampson

Abstract

Targeted sequencing of discrete gene sets is a cost effective strategy to screen subjects for monogenic forms of disease. One method to achieve this pairs microfluidic PCR with next generation sequencing. The PCR step of this pipeline creates challenges in accurate variant calling. This includes that most reads targeting a specific exon are duplicates that have been amplified from the PCR step. To reduce false positive variant calls from these experiments, previous studies have used threshold-based filtering of alternative allele depth ratio and manual inspection of the alignments. However even after manual inspection and filtering, many variants fail to be validated via Sanger sequencing. To improve the accuracy of variant calling from these experiments, we are challenged to design a variant filtering strategy that sufficiently models microfluidic PCR-specific issues. We developed an open source variant filtering pipeline, targeted sequencing support vector machine ("tarSVM"), that uses a Support Vector Machine (SVM) and a new score the normalized allele dosage test to identify high quality variants from microfluidic PCR data. tarSVM maximizes training knowledge by selecting variants that are likely true and likely false variants by incorporating knowledge from the 1000 Genomes and the Exome Aggregation Consortium projects. tarSVM improves on previous approaches by synthesizing variant features from the Genome Analysis Toolkit and allele dosage information. We compared the accuracy of tarSVM versus existing variant quality filtering strategies on two cohorts (n = 474 and n = 1152), and validated our method on a third cohort (n = 75). In the first cohort, our method achieved 84.5 % accuracy of predicting whether or not a variant would be validated with Sanger sequencing versus 78.8 % for the second most accurate method. In the second cohort, our method had an accuracy of 73.3 %, versus 61.5 % for the second best method. Finally, our method had a false discovery rate of 5 % for the validation cohort. tarSVM increases the accuracy of variant calling when using microfluidic PCR based targeted sequencing approaches. This results in higher confidence downstream analyses, and ultimately reduces the costs Sanger validation. Our approach is less labor intensive than existing approaches, and is available as an open source pipeline for read trimming, aligning, variant calling, and variant quality filtering on GitHub at https://github.com/christopher-gillies/TargetSpecificGATKSequencingPipeline .

X Demographics

X Demographics

The data shown below were collected from the profiles of 4 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 44 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Poland 1 2%
Unknown 43 98%

Demographic breakdown

Readers by professional status Count As %
Researcher 11 25%
Student > Ph. D. Student 10 23%
Student > Bachelor 5 11%
Student > Master 3 7%
Professor > Associate Professor 3 7%
Other 2 5%
Unknown 10 23%
Readers by discipline Count As %
Biochemistry, Genetics and Molecular Biology 11 25%
Agricultural and Biological Sciences 8 18%
Medicine and Dentistry 4 9%
Engineering 2 5%
Immunology and Microbiology 2 5%
Other 5 11%
Unknown 12 27%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 12 June 2016.
All research outputs
#14,717,488
of 23,577,761 outputs
Outputs from BMC Bioinformatics
#4,823
of 7,418 outputs
Outputs of similar age
#197,823
of 347,291 outputs
Outputs of similar age from BMC Bioinformatics
#62
of 95 outputs
Altmetric has tracked 23,577,761 research outputs across all sources so far. This one is in the 35th percentile – i.e., 35% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,418 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 30th percentile – i.e., 30% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 347,291 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 40th percentile – i.e., 40% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 95 others from the same source and published within six weeks on either side of this one. This one is in the 29th percentile – i.e., 29% of its contemporaries scored the same or lower than it.