↓ Skip to main content

Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct

Overview of attention for article published in Journal of Biomedical Semantics, March 2015
Altmetric Badge

Citations

dimensions_citation
18 Dimensions

Readers on

mendeley
27 Mendeley
citeulike
2 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct
Published in
Journal of Biomedical Semantics, March 2015
DOI 10.1186/s13326-015-0006-4
Pubmed ID
Authors

Christopher S Funk, Indika Kahanda, Asa Ben-Hur, Karin M Verspoor

Abstract

Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Function =0.408, Biological Process =0.461, Cellular Component =0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a "medium-throughput" pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which proteins are curated.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 27 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Australia 1 4%
Unknown 26 96%

Demographic breakdown

Readers by professional status Count As %
Student > Master 5 19%
Student > Ph. D. Student 5 19%
Researcher 4 15%
Student > Bachelor 2 7%
Professor 2 7%
Other 2 7%
Unknown 7 26%
Readers by discipline Count As %
Computer Science 14 52%
Agricultural and Biological Sciences 3 11%
Biochemistry, Genetics and Molecular Biology 2 7%
Social Sciences 1 4%
Medicine and Dentistry 1 4%
Other 0 0%
Unknown 6 22%