↓ Skip to main content

Integration of multi-omics data for prediction of phenotypic traits using random forest

Overview of attention for article published in BMC Bioinformatics, June 2016
Altmetric Badge

About this Attention Score

  • Good Attention Score compared to outputs of the same age (65th percentile)
  • Above-average Attention Score compared to outputs of the same age and source (55th percentile)

Mentioned by

twitter
7 X users

Citations

dimensions_citation
73 Dimensions

Readers on

mendeley
180 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Integration of multi-omics data for prediction of phenotypic traits using random forest
Published in
BMC Bioinformatics, June 2016
DOI 10.1186/s12859-016-1043-4
Pubmed ID
Authors

Animesh Acharjee, Bjorn Kloosterman, Richard G. F. Visser, Chris Maliepaard

Abstract

In order to find genetic and metabolic pathways related to phenotypic traits of interest, we analyzed gene expression data, metabolite data obtained with GC-MS and LC-MS, proteomics data and a selected set of tuber quality phenotypic data from a diploid segregating mapping population of potato. In this study we present an approach to integrate these ~ omics data sets for the purpose of predicting phenotypic traits. This gives us networks of relatively small sets of interrelated ~ omics variables that can predict, with higher accuracy, a quality trait of interest. We used Random Forest regression for integrating multiple ~ omics data for prediction of four quality traits of potato: tuber flesh colour, DSC onset, tuber shape and enzymatic discoloration. For tuber flesh colour beta-carotene hydroxylase and zeaxanthin epoxidase were ranked first and forty-fourth respectively both of which have previously been associated with flesh colour in potato tubers. Combining all the significant genes, LC-peaks, GC-peaks and proteins, the variation explained was 75 %, only slightly more than what gene expression or LC-MS data explain by themselves which indicates that there are correlations among the variables across data sets. For tuber shape regressed on the gene expression, LC-MS, GC-MS and proteomics data sets separately, only gene expression data was found to explain significant variation. For DSC onset, we found 12 significant gene expression, 5 metabolite levels (GC) and 2 proteins that are associated with the trait. Using those 19 significant variables, the variation explained was 45 %. Expression QTL (eQTL) analyses showed many associations with genomic regions in chromosome 2 with also the highest explained variation compared to other chromosomes. Transcriptomics and metabolomics analysis on enzymatic discoloration after 5 min resulted in 420 significant genes and 8 significant LC metabolites, among which two were putatively identified as caffeoylquinic acid methyl ester and tyrosine. In this study, we made a strategy for selecting and integrating multiple ~ omics data using random forest method and selected representative individual peaks for networks based on eQTL, mQTL or pQTL information. Network analysis was done to interpret how a particular trait is associated with gene expression, metabolite and protein data.

X Demographics

X Demographics

The data shown below were collected from the profiles of 7 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 180 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Spain 1 <1%
Brazil 1 <1%
Unknown 178 99%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 40 22%
Researcher 38 21%
Student > Master 23 13%
Student > Bachelor 14 8%
Student > Doctoral Student 13 7%
Other 16 9%
Unknown 36 20%
Readers by discipline Count As %
Agricultural and Biological Sciences 64 36%
Biochemistry, Genetics and Molecular Biology 26 14%
Computer Science 14 8%
Mathematics 6 3%
Engineering 6 3%
Other 20 11%
Unknown 44 24%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 4. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 25 June 2018.
All research outputs
#7,872,767
of 24,598,501 outputs
Outputs from BMC Bioinformatics
#2,967
of 7,559 outputs
Outputs of similar age
#119,744
of 347,085 outputs
Outputs of similar age from BMC Bioinformatics
#41
of 90 outputs
Altmetric has tracked 24,598,501 research outputs across all sources so far. This one has received more attention than most of these and is in the 67th percentile.
So far Altmetric has tracked 7,559 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one has gotten more attention than average, scoring higher than 60% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 347,085 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 65% of its contemporaries.
We're also able to compare this research output to 90 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 55% of its contemporaries.