↓ Skip to main content

Novel methods for genotype imputation to whole-genome sequence and a simple linear model to predict imputation accuracy

Overview of attention for article published in BMC Genomic Data, December 2017
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
11 Dimensions

Readers on

mendeley
49 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Novel methods for genotype imputation to whole-genome sequence and a simple linear model to predict imputation accuracy
Published in
BMC Genomic Data, December 2017
DOI 10.1186/s12863-017-0588-1
Pubmed ID
Authors

Steven G. Larmer, Mehdi Sargolzaei, Luiz F. Brito, Ricardo V. Ventura, Flávio S. Schenkel

Abstract

Accurate imputation plays a major role in genomic studies of livestock industries, where the number of genotyped or sequenced animals is limited by costs. This study explored methods to create an ideal reference population for imputation to Next Generation Sequencing data in cattle. Methods for clustering of animals for imputation were explored, using 1000 Bull Genomes Project sequence data on 1146 animals from a variety of beef and dairy breeds. Imputation from 50 K to 777 K was first carried out to choose an ideal clustering method, using ADMIXTURE or PLINK clustering algorithms with either genotypes or reconstructed haplotypes. Due to efficiency, accuracy and ease of use, clustering with PLINK using haplotypes as quasi-genotypes was chosen as the most advantageous grouping method. It was found that using a clustered population slightly decreased computing time, while maintaining accuracy across the population. Although overall accuracy remained the same, a slight increase in accuracy was observed for groups of animals in some breeds (primarily purebred beef cattle from breeds with fewer sequenced animals) and for other groups, primarily crossbreed animals, a slight decrease in accuracy was observed. However, it was noted that some animals in each breed were poorly imputed across all methods. When imputed sequences were included in the reference population to aid imputation of poorly imputed animals, a small increase in overall accuracy was observed for nearly every individual in the population. Two models were created to predict imputation accuracy, a complete model using all information available including Euclidean distances from genotypes and haplotypes, pedigree information, and clustering groups and a simple model using only breed and an Euclidean distance matrix as predictors. Both models were successful in predicting imputation accuracy, with correlations between predicted and true imputation accuracy as measured by concordance rate of 0.87 and 0.83, respectively. A clustering methodology can be very useful to subgroup cattle for efficient genotype imputation. In addition, accuracy of genotype imputation from medium to high-density Single Nucleotide Polymorphisms (SNP) chip panels to whole-genome sequence can be predicted well using a simple linear model defined in this study.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 49 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 49 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 10 20%
Student > Ph. D. Student 6 12%
Student > Master 6 12%
Other 5 10%
Student > Doctoral Student 4 8%
Other 10 20%
Unknown 8 16%
Readers by discipline Count As %
Agricultural and Biological Sciences 20 41%
Biochemistry, Genetics and Molecular Biology 11 22%
Medicine and Dentistry 3 6%
Veterinary Science and Veterinary Medicine 2 4%
Social Sciences 1 2%
Other 1 2%
Unknown 11 22%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 29 December 2017.
All research outputs
#22,764,772
of 25,382,440 outputs
Outputs from BMC Genomic Data
#1,008
of 1,204 outputs
Outputs of similar age
#388,718
of 449,047 outputs
Outputs of similar age from BMC Genomic Data
#19
of 24 outputs
Altmetric has tracked 25,382,440 research outputs across all sources so far. This one is in the 1st percentile – i.e., 1% of other outputs scored the same or lower than it.
So far Altmetric has tracked 1,204 research outputs from this source. They receive a mean Attention Score of 4.3. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 449,047 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 24 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.