↓ Skip to main content

Predicting protein-binding regions in RNA using nucleotide profiles and compositions

Overview of attention for article published in BMC Systems Biology, March 2017
Altmetric Badge

About this Attention Score

  • Average Attention Score compared to outputs of the same age
  • Average Attention Score compared to outputs of the same age and source

Mentioned by

twitter
3 X users

Citations

dimensions_citation
17 Dimensions

Readers on

mendeley
22 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Predicting protein-binding regions in RNA using nucleotide profiles and compositions
Published in
BMC Systems Biology, March 2017
DOI 10.1186/s12918-017-0386-4
Pubmed ID
Authors

Daesik Choi, Byungkyu Park, Hanju Chae, Wook Lee, Kyungsook Han

Abstract

Motivated by the increased amount of data on protein-RNA interactions and the availability of complete genome sequences of several organisms, many computational methods have been proposed to predict binding sites in protein-RNA interactions. However, most computational methods are limited to finding RNA-binding sites in proteins instead of protein-binding sites in RNAs. Predicting protein-binding sites in RNA is more challenging than predicting RNA-binding sites in proteins. Recent computational methods for finding protein-binding sites in RNAs have several drawbacks for practical use. We developed a new support vector machine (SVM) model for predicting protein-binding regions in mRNA sequences. The model uses sequence profiles constructed from log-odds scores of mono- and di-nucleotides and nucleotide compositions. The model was evaluated by standard 10-fold cross validation, leave-one-protein-out (LOPO) cross validation and independent testing. Since actual mRNA sequences have more non-binding regions than protein-binding regions, we tested the model on several datasets with different ratios of protein-binding regions to non-binding regions. The best performance of the model was obtained in a balanced dataset of positive and negative instances. 10-fold cross validation with a balanced dataset achieved a sensitivity of 91.6%, a specificity of 92.4%, an accuracy of 92.0%, a positive predictive value (PPV) of 91.7%, a negative predictive value (NPV) of 92.3% and a Matthews correlation coefficient (MCC) of 0.840. LOPO cross validation showed a lower performance than the 10-fold cross validation, but the performance remains high (87.6% accuracy and 0.752 MCC). In testing the model on independent datasets, it achieved an accuracy of 82.2% and an MCC of 0.656. Testing of our model and other state-of-the-art methods on a same dataset showed that our model is better than the others. Sequence profiles of log-odds scores of mono- and di-nucleotides were much more powerful features than nucleotide compositions in finding protein-binding regions in RNA sequences. But, a slight performance gain was obtained when using the sequence profiles along with nucleotide compositions. These are preliminary results of ongoing research, but demonstrate the potential of our approach as a powerful predictor of protein-binding regions in RNA. The program and supporting data are available at http://bclab.inha.ac.kr/RBPbinding .

X Demographics

X Demographics

The data shown below were collected from the profiles of 3 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 22 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 22 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 8 36%
Student > Master 4 18%
Student > Bachelor 3 14%
Researcher 2 9%
Unspecified 1 5%
Other 1 5%
Unknown 3 14%
Readers by discipline Count As %
Computer Science 7 32%
Agricultural and Biological Sciences 6 27%
Biochemistry, Genetics and Molecular Biology 4 18%
Unspecified 1 5%
Medicine and Dentistry 1 5%
Other 1 5%
Unknown 2 9%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 31 March 2017.
All research outputs
#14,928,316
of 22,961,203 outputs
Outputs from BMC Systems Biology
#602
of 1,144 outputs
Outputs of similar age
#184,569
of 307,967 outputs
Outputs of similar age from BMC Systems Biology
#15
of 32 outputs
Altmetric has tracked 22,961,203 research outputs across all sources so far. This one is in the 32nd percentile – i.e., 32% of other outputs scored the same or lower than it.
So far Altmetric has tracked 1,144 research outputs from this source. They receive a mean Attention Score of 3.6. This one is in the 43rd percentile – i.e., 43% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 307,967 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 37th percentile – i.e., 37% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 32 others from the same source and published within six weeks on either side of this one. This one is in the 46th percentile – i.e., 46% of its contemporaries scored the same or lower than it.