↓ Skip to main content

Does encoding matter? A novel view on the quantitative genetic trait prediction problem

Overview of attention for article published in BMC Bioinformatics, July 2016
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
14 Dimensions

Readers on

mendeley
20 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Does encoding matter? A novel view on the quantitative genetic trait prediction problem
Published in
BMC Bioinformatics, July 2016
DOI 10.1186/s12859-016-1127-1
Pubmed ID
Authors

Dan He, Laxmi Parida

Abstract

Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models which require quantitative encodings for the genotypes. There are lots of work on the prediction algorithms, but none of the existing work investigated the effects of the encodings on the genetic trait prediction problem. In this work, we view the genetic trait prediction problem from a novel angle: a multiple regression on categorical data problem, which requires encoding the categorical data into numerical data. We further proposed two novel encoding methods and we show that they are able to generate numerical features with higher predictive power. Our experiments show that our methods are superior to the other encoding methods for both single marker model and epistasis model. We showed that the quantitative genetic trait prediction problem heavily depends on the encoding of genotypes, for both single marker model and epistasis model. We conducted a detailed analysis on the performance of the hybrid encodings. To our knowledge, this is the first work that discusses the effects of encodings for genetic trait prediction problem.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 20 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 20 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 5 25%
Student > Master 4 20%
Student > Doctoral Student 3 15%
Researcher 2 10%
Professor 1 5%
Other 3 15%
Unknown 2 10%
Readers by discipline Count As %
Agricultural and Biological Sciences 5 25%
Biochemistry, Genetics and Molecular Biology 5 25%
Computer Science 3 15%
Veterinary Science and Veterinary Medicine 1 5%
Economics, Econometrics and Finance 1 5%
Other 3 15%
Unknown 2 10%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 19 July 2016.
All research outputs
#20,335,770
of 22,880,691 outputs
Outputs from BMC Bioinformatics
#6,872
of 7,298 outputs
Outputs of similar age
#317,189
of 363,105 outputs
Outputs of similar age from BMC Bioinformatics
#95
of 108 outputs
Altmetric has tracked 22,880,691 research outputs across all sources so far. This one is in the 1st percentile – i.e., 1% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,298 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 363,105 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 108 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.