↓ Skip to main content

Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach

Overview of attention for article published in BMC Bioinformatics, September 2016
Altmetric Badge

Mentioned by

twitter
2 X users

Citations

dimensions_citation
27 Dimensions

Readers on

mendeley
29 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach
Published in
BMC Bioinformatics, September 2016
DOI 10.1186/s12859-016-1209-0
Pubmed ID
Authors

Shamima Rashid, Saras Saraswathi, Andrzej Kloczkowski, Suresh Sundaram, Andrzej Kolinski

Abstract

Protein secondary structure prediction (SSP) has been an area of intense research interest. Despite advances in recent methods conducted on large datasets, the estimated upper limit accuracy is yet to be reached. Since the predictions of SSP methods are applied as input to higher-level structure prediction pipelines, even small errors may have large perturbations in final models. Previous works relied on cross validation as an estimate of classifier accuracy. However, training on large numbers of protein chains compromises the classifier ability to generalize to new sequences. This prompts a novel approach to training and an investigation into the possible structural factors that lead to poor predictions. Here, a small group of 55 proteins termed the compact model is selected from the CB513 dataset using a heuristics-based approach. In a prior work, all sequences were represented as probability matrices of residues adopting each of Helix, Sheet and Coil states, based on energy calculations using the C-Alpha, C-Beta, Side-chain (CABS) algorithm. The functional relationship between the conformational energies computed with CABS force-field and residue states is approximated using a classifier termed the Fully Complex-valued Relaxation Network (FCRN). The FCRN is trained with the compact model proteins. The performance of the compact model is compared with traditional cross-validated accuracies and blind-tested on a dataset of G Switch proteins, obtaining accuracies of ∼81 %. The model demonstrates better results when compared to several techniques in the literature. A comparative case study of the worst performing chain identifies hydrogen bond contacts that lead to Coil ⇔ Sheet misclassifications. Overall, mispredicted Coil residues have a higher propensity to participate in backbone hydrogen bonding than correctly predicted Coils. The implications of these findings are: (i) the choice of training proteins is important in preserving the generalization of a classifier to predict new sequences accurately and (ii) SSP techniques sensitive in distinguishing between backbone hydrogen bonding and side-chain or water-mediated hydrogen bonding might be needed in the reduction of Coil ⇔ Sheet misclassifications.

X Demographics

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 29 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 1 3%
Canada 1 3%
Unknown 27 93%

Demographic breakdown

Readers by professional status Count As %
Researcher 7 24%
Student > Ph. D. Student 5 17%
Student > Bachelor 5 17%
Professor > Associate Professor 2 7%
Student > Master 2 7%
Other 5 17%
Unknown 3 10%
Readers by discipline Count As %
Biochemistry, Genetics and Molecular Biology 5 17%
Computer Science 5 17%
Agricultural and Biological Sciences 5 17%
Chemistry 4 14%
Environmental Science 1 3%
Other 5 17%
Unknown 4 14%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 14 September 2016.
All research outputs
#18,471,305
of 22,888,307 outputs
Outputs from BMC Bioinformatics
#6,330
of 7,298 outputs
Outputs of similar age
#244,523
of 322,146 outputs
Outputs of similar age from BMC Bioinformatics
#99
of 121 outputs
Altmetric has tracked 22,888,307 research outputs across all sources so far. This one is in the 11th percentile – i.e., 11% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,298 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 5th percentile – i.e., 5% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 322,146 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 13th percentile – i.e., 13% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 121 others from the same source and published within six weeks on either side of this one. This one is in the 7th percentile – i.e., 7% of its contemporaries scored the same or lower than it.