↓ Skip to main content

A hybrid computational strategy to address WGS variant analysis in >5000 samples

Overview of attention for article published in BMC Bioinformatics, September 2016
Altmetric Badge

About this Attention Score

  • In the top 5% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (93rd percentile)
  • High Attention Score compared to outputs of the same age and source (98th percentile)

Mentioned by

news
4 news outlets
twitter
11 X users

Citations

dimensions_citation
8 Dimensions

Readers on

mendeley
52 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
A hybrid computational strategy to address WGS variant analysis in >5000 samples
Published in
BMC Bioinformatics, September 2016
DOI 10.1186/s12859-016-1211-6
Pubmed ID
Authors

Zhuoyi Huang, Navin Rustagi, Narayanan Veeraraghavan, Andrew Carroll, Richard Gibbs, Eric Boerwinkle, Manjunath Gorentla Venkata, Fuli Yu

Abstract

The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.

X Demographics

X Demographics

The data shown below were collected from the profiles of 11 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 52 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Netherlands 1 2%
Unknown 51 98%

Demographic breakdown

Readers by professional status Count As %
Researcher 14 27%
Student > Ph. D. Student 7 13%
Student > Master 7 13%
Student > Bachelor 6 12%
Student > Doctoral Student 4 8%
Other 7 13%
Unknown 7 13%
Readers by discipline Count As %
Agricultural and Biological Sciences 11 21%
Computer Science 11 21%
Biochemistry, Genetics and Molecular Biology 8 15%
Engineering 4 8%
Medicine and Dentistry 3 6%
Other 8 15%
Unknown 7 13%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 32. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 19 September 2016.
All research outputs
#1,209,971
of 25,139,853 outputs
Outputs from BMC Bioinformatics
#125
of 7,654 outputs
Outputs of similar age
#21,649
of 333,402 outputs
Outputs of similar age from BMC Bioinformatics
#3
of 126 outputs
Altmetric has tracked 25,139,853 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 95th percentile: it's in the top 5% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 7,654 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one has done particularly well, scoring higher than 98% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 333,402 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 93% of its contemporaries.
We're also able to compare this research output to 126 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 98% of its contemporaries.