↓ Skip to main content

Fast imputation using medium or low-coverage sequence data

Overview of attention for article published in BMC Genomic Data, July 2015
Altmetric Badge

Mentioned by

twitter
2 X users

Citations

dimensions_citation
72 Dimensions

Readers on

mendeley
86 Mendeley
citeulike
1 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Fast imputation using medium or low-coverage sequence data
Published in
BMC Genomic Data, July 2015
DOI 10.1186/s12863-015-0243-7
Pubmed ID
Authors

Paul M. VanRaden, Chuanyu Sun, Jeffrey R. O’Connell

Abstract

Accurate genotype imputation can greatly reduce costs and increase benefits by combining whole-genome sequence data of varying read depth and array genotypes of varying densities. For large populations, an efficient strategy chooses the two haplotypes most likely to form each genotype and updates posterior allele probabilities from prior probabilities within those two haplotypes as each individual's sequence is processed. Directly using allele read counts can improve imputation accuracy and reduce computation compared with calling or computing genotype probabilities first and then imputing. A new algorithm was implemented in findhap (version 4) software and tested using simulated bovine and actual human sequence data with different combinations of reference population size, sequence read depth and error rate. Read depths of ≥8× may be desired for direct investigation of sequenced individuals, but for a given total cost, sequencing more individuals at read depths of 2× to 4× gave more accurate imputation from array genotypes. Imputation accuracy improved further if reference individuals had both low-coverage sequence and high-density (HD) microarray data, and remained high even with a read error rate of 16 %. With read depths of ≤4×, findhap (version 4) had higher accuracy than Beagle (version 4); computing time was up to 400 times faster with findhap than with Beagle. For 10,000 sequenced individuals plus 250 with HD array genotypes to test imputation, findhap used 7 hours, 10 processors and 50 GB of memory for 1 million loci on one chromosome. Computing times increased in proportion to population size but less than proportional to number of variants. Simultaneous genotype calling from low-coverage sequence data and imputation from array genotypes of various densities is done very efficiently within findhap by updating allele probabilities within the two haplotypes for each individual. Accuracy of genotype calling and imputation were high with both simulated bovine and actual human genomes reduced to low-coverage sequence and HD microarray data. More efficient imputation allows geneticists to locate and test effects of more DNA variants from more individuals and to include those in future prediction and selection.

X Demographics

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 86 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
New Zealand 1 1%
Poland 1 1%
Denmark 1 1%
Unknown 83 97%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 21 24%
Researcher 17 20%
Student > Master 9 10%
Student > Bachelor 5 6%
Student > Postgraduate 5 6%
Other 13 15%
Unknown 16 19%
Readers by discipline Count As %
Agricultural and Biological Sciences 43 50%
Biochemistry, Genetics and Molecular Biology 12 14%
Computer Science 5 6%
Unspecified 2 2%
Veterinary Science and Veterinary Medicine 2 2%
Other 3 3%
Unknown 19 22%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 25 March 2016.
All research outputs
#20,655,488
of 25,371,288 outputs
Outputs from BMC Genomic Data
#861
of 1,204 outputs
Outputs of similar age
#202,226
of 276,407 outputs
Outputs of similar age from BMC Genomic Data
#33
of 46 outputs
Altmetric has tracked 25,371,288 research outputs across all sources so far. This one is in the 10th percentile – i.e., 10% of other outputs scored the same or lower than it.
So far Altmetric has tracked 1,204 research outputs from this source. They receive a mean Attention Score of 4.3. This one is in the 16th percentile – i.e., 16% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 276,407 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 14th percentile – i.e., 14% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 46 others from the same source and published within six weeks on either side of this one. This one is in the 19th percentile – i.e., 19% of its contemporaries scored the same or lower than it.