↓ Skip to main content

The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches

Overview of attention for article published in Giga Science, September 2015
Altmetric Badge

About this Attention Score

  • Good Attention Score compared to outputs of the same age (71st percentile)

Mentioned by

twitter
4 X users
peer_reviews
1 peer review site
facebook
1 Facebook page
googleplus
1 Google+ user

Citations

dimensions_citation
16 Dimensions

Readers on

mendeley
38 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches
Published in
Giga Science, September 2015
DOI 10.1186/s13742-015-0083-4
Pubmed ID
Authors

Ishita K. Khan, Qing Wei, Samuel Chapman, Dukka B. KC, Daisuke Kihara

Abstract

Functional annotation of novel proteins is one of the central problems in bioinformatics. With the ever-increasing development of genome sequencing technologies, more and more sequence information is becoming available to analyze and annotate. To achieve fast and automatic function annotation, many computational (automated) function prediction (AFP) methods have been developed. To objectively evaluate the performance of such methods on a large scale, community-wide assessment experiments have been conducted. The second round of the Critical Assessment of Function Annotation (CAFA) experiment was held in 2013-2014. Evaluation of participating groups was reported in a special interest group meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in Boston in 2014. Our group participated in both CAFA1 and CAFA2 using multiple, in-house AFP methods. Here, we report benchmark results of our methods obtained in the course of preparation for CAFA2 prior to submitting function predictions for CAFA2 targets. For CAFA2, we updated the annotation databases used by our methods, protein function prediction (PFP) and extended similarity group (ESG), and benchmarked their function prediction performances using the original (older) and updated databases. Performance evaluation for PFP with different settings and ESG are discussed. We also developed two ensemble methods that combine function predictions from six independent, sequence-based AFP methods. We further analyzed the performances of our prediction methods by enriching the predictions with prior distribution of gene ontology (GO) terms. Examples of predictions by the ensemble methods are discussed. Updating the annotation database was successful, improving the Fmax prediction accuracy score for both PFP and ESG. Adding the prior distribution of GO terms did not make much improvement. Both of the ensemble methods we developed improved the average Fmax score over all individual component methods except for ESG. Our benchmark results will not only complement the overall assessment that will be done by the CAFA organizers, but also help elucidate the predictive powers of sequence-based function prediction methods in general.

X Demographics

X Demographics

The data shown below were collected from the profiles of 4 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 38 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 38 100%

Demographic breakdown

Readers by professional status Count As %
Student > Master 5 13%
Student > Doctoral Student 4 11%
Professor 3 8%
Researcher 3 8%
Lecturer 2 5%
Other 8 21%
Unknown 13 34%
Readers by discipline Count As %
Computer Science 6 16%
Agricultural and Biological Sciences 6 16%
Business, Management and Accounting 4 11%
Biochemistry, Genetics and Molecular Biology 3 8%
Economics, Econometrics and Finance 2 5%
Other 4 11%
Unknown 13 34%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 5. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 21 September 2015.
All research outputs
#7,204,326
of 25,371,288 outputs
Outputs from Giga Science
#961
of 1,167 outputs
Outputs of similar age
#79,175
of 280,717 outputs
Outputs of similar age from Giga Science
#14
of 16 outputs
Altmetric has tracked 25,371,288 research outputs across all sources so far. This one has received more attention than most of these and is in the 71st percentile.
So far Altmetric has tracked 1,167 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 21.8. This one is in the 17th percentile – i.e., 17% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 280,717 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 71% of its contemporaries.
We're also able to compare this research output to 16 others from the same source and published within six weeks on either side of this one. This one is in the 12th percentile – i.e., 12% of its contemporaries scored the same or lower than it.