Report for: Does encoding matter? A novel view on the quantitative genetic trait prediction problem

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Does encoding matter? A novel view on the quantitative genetic trait prediction problem
Published in	BMC Bioinformatics, July 2016
DOI	10.1186/s12859-016-1127-1
Pubmed ID	27454886
Authors	Dan He, Laxmi Parida
Abstract	Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models which require quantitative encodings for the genotypes. There are lots of work on the prediction algorithms, but none of the existing work investigated the effects of the encodings on the genetic trait prediction problem. In this work, we view the genetic trait prediction problem from a novel angle: a multiple regression on categorical data problem, which requires encoding the categorical data into numerical data. We further proposed two novel encoding methods and we show that they are able to generate numerical features with higher predictive power. Our experiments show that our methods are superior to the other encoding methods for both single marker model and epistasis model. We showed that the quantitative genetic trait prediction problem heavily depends on the encoding of genotypes, for both single marker model and epistasis model. We conducted a detailed analysis on the performance of the hybrid encodings. To our knowledge, this is the first work that discusses the effects of encodings for genetic trait prediction problem.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
Unknown	1	100%

Demographic breakdown

Type	Count	As %
Scientists	1	100%

Mendeley readers

The data shown below were compiled from readership statistics for 20 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	20	100%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	5	25%
Student > Master	4	20%
Student > Doctoral Student	3	15%
Researcher	2	10%
Professor	1	5%
Other	3	15%
Unknown	2	10%

Readers by discipline	Count	As %
Agricultural and Biological Sciences	5	25%
Biochemistry, Genetics and Molecular Biology	5	25%
Computer Science	3	15%
Veterinary Science and Veterinary Medicine	1	5%
Economics, Econometrics and Finance	1	5%
Other	3	15%
Unknown	2	10%

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 19 July 2016.

All research outputs

#20,335,770

of 22,880,691 outputs

Outputs from BMC Bioinformatics

#6,872

of 7,298 outputs

Outputs of similar age

#317,189

of 363,105 outputs

Outputs of similar age from BMC Bioinformatics

#95

of 108 outputs

Altmetric has tracked 22,880,691 research outputs across all sources so far. This one is in the 1st percentile – i.e., 1% of other outputs scored the same or lower than it.

So far Altmetric has tracked 7,298 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 363,105 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 108 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.

Does encoding matter? A novel view on the quantitative genetic trait prediction problem

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context