Report for: Predicting protein-binding regions in RNA using nucleotide profiles and compositions

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Predicting protein-binding regions in RNA using nucleotide profiles and compositions
Published in	BMC Systems Biology, March 2017
DOI	10.1186/s12918-017-0386-4
Pubmed ID	28361677
Authors	Daesik Choi, Byungkyu Park, Hanju Chae, Wook Lee, Kyungsook Han
Abstract	Motivated by the increased amount of data on protein-RNA interactions and the availability of complete genome sequences of several organisms, many computational methods have been proposed to predict binding sites in protein-RNA interactions. However, most computational methods are limited to finding RNA-binding sites in proteins instead of protein-binding sites in RNAs. Predicting protein-binding sites in RNA is more challenging than predicting RNA-binding sites in proteins. Recent computational methods for finding protein-binding sites in RNAs have several drawbacks for practical use. We developed a new support vector machine (SVM) model for predicting protein-binding regions in mRNA sequences. The model uses sequence profiles constructed from log-odds scores of mono- and di-nucleotides and nucleotide compositions. The model was evaluated by standard 10-fold cross validation, leave-one-protein-out (LOPO) cross validation and independent testing. Since actual mRNA sequences have more non-binding regions than protein-binding regions, we tested the model on several datasets with different ratios of protein-binding regions to non-binding regions. The best performance of the model was obtained in a balanced dataset of positive and negative instances. 10-fold cross validation with a balanced dataset achieved a sensitivity of 91.6%, a specificity of 92.4%, an accuracy of 92.0%, a positive predictive value (PPV) of 91.7%, a negative predictive value (NPV) of 92.3% and a Matthews correlation coefficient (MCC) of 0.840. LOPO cross validation showed a lower performance than the 10-fold cross validation, but the performance remains high (87.6% accuracy and 0.752 MCC). In testing the model on independent datasets, it achieved an accuracy of 82.2% and an MCC of 0.656. Testing of our model and other state-of-the-art methods on a same dataset showed that our model is better than the others. Sequence profiles of log-odds scores of mono- and di-nucleotides were much more powerful features than nucleotide compositions in finding protein-binding regions in RNA sequences. But, a slight performance gain was obtained when using the sequence profiles along with nucleotide compositions. These are preliminary results of ongoing research, but demonstrate the potential of our approach as a powerful predictor of protein-binding regions in RNA. The program and supporting data are available at http://bclab.inha.ac.kr/RBPbinding .

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 3 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United States	1	33%
France	1	33%
United Kingdom	1	33%

Demographic breakdown

Type	Count	As %
Scientists	3	100%

Mendeley readers

The data shown below were compiled from readership statistics for 22 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	22	100%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	8	36%
Student > Master	4	18%
Student > Bachelor	3	14%
Researcher	2	9%
Unspecified	1	5%
Other	1	5%
Unknown	3	14%

Readers by discipline	Count	As %
Computer Science	7	32%
Agricultural and Biological Sciences	6	27%
Biochemistry, Genetics and Molecular Biology	4	18%
Unspecified	1	5%
Medicine and Dentistry	1	5%
Other	1	5%
Unknown	2	9%

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 31 March 2017.

All research outputs

#14,928,316

of 22,961,203 outputs

Outputs from BMC Systems Biology

#602

of 1,144 outputs

Outputs of similar age

#184,569

of 307,967 outputs

Outputs of similar age from BMC Systems Biology

#15

of 32 outputs

Altmetric has tracked 22,961,203 research outputs across all sources so far. This one is in the 32nd percentile – i.e., 32% of other outputs scored the same or lower than it.

So far Altmetric has tracked 1,144 research outputs from this source. They receive a mean Attention Score of 3.6. This one is in the 43rd percentile – i.e., 43% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 307,967 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 37th percentile – i.e., 37% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 32 others from the same source and published within six weeks on either side of this one. This one is in the 46th percentile – i.e., 46% of its contemporaries scored the same or lower than it.

Predicting protein-binding regions in RNA using nucleotide profiles and compositions

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context