Report for: Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences
Published in	BMC Bioinformatics, March 2016
DOI	10.1186/s12859-016-0959-z
Pubmed ID	26940649
Authors	Binghuang Cai, Xia Jiang
Abstract	Ubiquitination is a very important process in protein post-translational modification, which has been widely investigated by biology scientists and researchers. Different experimental and computational methods have been developed to identify the ubiquitination sites in protein sequences. This paper aims at exploring computational machine learning methods for the prediction of ubiquitination sites using the physicochemical properties (PCPs) of amino acids in the protein sequences. We first establish six different ubiquitination data sets, whose records contain both ubiquitination sites and non-ubiquitination sites in variant numbers of protein sequence segments. In particular, to establish such data sets, protein sequence segments are extracted from the original protein sequences used in four published papers on ubiquitination, while 531 PCP features of each extracted protein sequence segment are calculated based on PCP values from AAindex (Amino Acid index database) by averaging PCP values of all amino acids on each segment. Various computational machine-learning methods, including four Bayesian network methods (i.e., Naïve Bayes (NB), Feature Selection NB (FSNB), Model Averaged NB (MANB), and Efficient Bayesian Multivariate Classifier (EBMC)) and three regression methods (i.e., Support Vector Machine (SVM), Logistic Regression (LR), and Least Absolute Shrinkage and Selection Operator (LASSO)), are then applied to the six established segment-PCP data sets. Five-fold cross-validation and the Area Under Receiver Operating Characteristic Curve (AUROC) are employed to evaluate the ubiquitination prediction performance of each method. Results demonstrate that the PCP data of protein sequences contain information that could be mined by machine learning methods for ubiquitination site prediction. The comparative results show that EBMC, SVM and LR perform better than other methods, and EBMC is the only method that can get AUCs greater than or equal to 0.6 for the six established data sets. Results also show EBMC tends to perform better for larger data. Machine learning methods have been employed for the ubiquitination site prediction based on physicochemical properties of amino acids on protein sequences. Results demonstrate the effectiveness of using machine learning methodology to mine information from PCP data concerning protein sequences, as well as the superiority of EBMC, SVM and LR (especially EBMC) for the ubiquitination prediction compared to other methods.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United States	1	50%
Unknown	1	50%

Demographic breakdown

Type	Count	As %
Scientists	1	50%
Members of the public	1	50%

Mendeley readers

The data shown below were compiled from readership statistics for 41 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	41	100%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	8	20%
Student > Master	7	17%
Researcher	6	15%
Student > Bachelor	3	7%
Other	3	7%
Other	2	5%
Unknown	12	29%

Readers by discipline	Count	As %
Computer Science	10	24%
Biochemistry, Genetics and Molecular Biology	6	15%
Agricultural and Biological Sciences	5	12%
Medicine and Dentistry	3	7%
Engineering	2	5%
Other	3	7%
Unknown	12	29%

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 04 March 2016.

All research outputs

#14,839,922

of 22,852,911 outputs

Outputs from BMC Bioinformatics

#5,047

of 7,292 outputs

Outputs of similar age

#167,521

of 298,620 outputs

Outputs of similar age from BMC Bioinformatics

#94

of 129 outputs

Altmetric has tracked 22,852,911 research outputs across all sources so far. This one is in the 33rd percentile – i.e., 33% of other outputs scored the same or lower than it.

So far Altmetric has tracked 7,292 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 26th percentile – i.e., 26% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 298,620 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 40th percentile – i.e., 40% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 129 others from the same source and published within six weeks on either side of this one. This one is in the 24th percentile – i.e., 24% of its contemporaries scored the same or lower than it.

Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context