↓ Skip to main content

Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data

Overview of attention for article published in BMC Bioinformatics, March 2017
Altmetric Badge

Mentioned by

facebook
1 Facebook page

Citations

dimensions_citation
14 Dimensions

Readers on

mendeley
31 Mendeley
citeulike
1 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data
Published in
BMC Bioinformatics, March 2017
DOI 10.1186/s12859-017-1533-z
Pubmed ID
Authors

Raquel Dias, Bryan Kolaczkowski

Abstract

One goal of structural biology is to understand how a protein's 3-dimensional conformation determines its capacity to interact with potential ligands. In the case of small chemical ligands, deconstructing a static protein-ligand complex into its constituent atom-atom interactions is typically sufficient to rapidly predict ligand affinity with high accuracy (>70% correlation between predicted and experimentally-determined affinity), a fact that is exploited to support structure-based drug design. We recently found that protein-DNA/RNA affinity can also be predicted with high accuracy using extensions of existing techniques, but protein-protein affinity could not be predicted with >60% correlation, even when the protein-protein complex was available. X-ray and NMR structures of protein-protein complexes, their associated binding affinities and experimental conditions were obtained from different binding affinity and structural databases. Statistical models were implemented using a generalized linear model framework, including the experimental conditions as new model features. We evaluated the potential for new features to improve affinity prediction models by calculating the Pearson correlation between predicted and experimental binding affinities on the training and test data after model fitting and after cross-validation. Differences in accuracy were assessed using two-sample t test and nonparametric Mann-Whitney U test. Here we evaluate a range of potential factors that may interfere with accurate protein-protein affinity prediction. We find that X-ray crystal resolution has the strongest single effect on protein-protein affinity prediction. Limiting our analyses to only high-resolution complexes (≤2.5 Å) increased the correlation between predicted and experimental affinity from 54 to 68% (p = 4.32x10(-3)). In addition, incorporating information on the experimental conditions under which affinities were measured (pH, temperature and binding assay) had significant effects on prediction accuracy. We also highlight a number of potential errors in large structure-affinity databases, which could affect both model training and accuracy assessment. The results suggest that the accuracy of statistical models for protein-protein affinity prediction may be limited by the information present in databases used to train new models. Improving our capacity to integrate large-scale structural and functional information may be required to substantively advance our understanding of the general principles by which a protein's structure determines its function.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 31 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 31 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 7 23%
Student > Master 4 13%
Researcher 4 13%
Student > Bachelor 4 13%
Student > Postgraduate 3 10%
Other 4 13%
Unknown 5 16%
Readers by discipline Count As %
Biochemistry, Genetics and Molecular Biology 9 29%
Chemistry 3 10%
Computer Science 2 6%
Agricultural and Biological Sciences 2 6%
Psychology 2 6%
Other 7 23%
Unknown 6 19%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 27 March 2017.
All research outputs
#20,411,380
of 22,961,203 outputs
Outputs from BMC Bioinformatics
#6,881
of 7,306 outputs
Outputs of similar age
#269,561
of 309,217 outputs
Outputs of similar age from BMC Bioinformatics
#110
of 124 outputs
Altmetric has tracked 22,961,203 research outputs across all sources so far. This one is in the 1st percentile – i.e., 1% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,306 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 309,217 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 124 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.