↓ Skip to main content

Prediction of plant lncRNA by ensemble machine learning classifiers

Overview of attention for article published in BMC Genomics, May 2018
Altmetric Badge

About this Attention Score

  • Above-average Attention Score compared to outputs of the same age (63rd percentile)
  • Good Attention Score compared to outputs of the same age and source (69th percentile)

Mentioned by

twitter
2 X users
wikipedia
2 Wikipedia pages

Citations

dimensions_citation
53 Dimensions

Readers on

mendeley
96 Mendeley
citeulike
1 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Prediction of plant lncRNA by ensemble machine learning classifiers
Published in
BMC Genomics, May 2018
DOI 10.1186/s12864-018-4665-2
Pubmed ID
Authors

Caitlin M. A. Simopoulos, Elizabeth A. Weretilnyk, G. Brian Golding

Abstract

In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.

X Demographics

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 96 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 96 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 20 21%
Student > Master 13 14%
Researcher 9 9%
Other 8 8%
Student > Doctoral Student 7 7%
Other 14 15%
Unknown 25 26%
Readers by discipline Count As %
Agricultural and Biological Sciences 33 34%
Biochemistry, Genetics and Molecular Biology 18 19%
Computer Science 7 7%
Medicine and Dentistry 3 3%
Unspecified 2 2%
Other 5 5%
Unknown 28 29%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 4. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 28 March 2022.
All research outputs
#7,016,567
of 23,435,471 outputs
Outputs from BMC Genomics
#3,143
of 10,771 outputs
Outputs of similar age
#119,020
of 327,325 outputs
Outputs of similar age from BMC Genomics
#71
of 243 outputs
Altmetric has tracked 23,435,471 research outputs across all sources so far. This one has received more attention than most of these and is in the 69th percentile.
So far Altmetric has tracked 10,771 research outputs from this source. They receive a mean Attention Score of 4.7. This one has gotten more attention than average, scoring higher than 70% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 327,325 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 63% of its contemporaries.
We're also able to compare this research output to 243 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 69% of its contemporaries.