↓ Skip to main content

An improved approach to infer protein-protein interaction based on a hierarchical vector space model

Overview of attention for article published in BMC Bioinformatics, April 2018
Altmetric Badge

Citations

dimensions_citation
22 Dimensions

Readers on

mendeley
266 Mendeley
citeulike
1 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
An improved approach to infer protein-protein interaction based on a hierarchical vector space model
Published in
BMC Bioinformatics, April 2018
DOI 10.1186/s12859-018-2152-z
Pubmed ID
Authors

Jiongmin Zhang, Ke Jia, Jinmeng Jia, Ying Qian

Abstract

Comparing and classifying functions of gene products are important in today's biomedical research. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most widely used indicators for protein interaction. Among the various approaches proposed, those based on the vector space model are relatively simple, but their effectiveness is far from satisfying. We propose a Hierarchical Vector Space Model (HVSM) for computing semantic similarity between different genes or their products, which enhances the basic vector space model by introducing the relation between GO terms. Besides the directly annotated terms, HVSM also takes their ancestors and descendants related by "is_a" and "part_of" relations into account. Moreover, HVSM introduces the concept of a Certainty Factor to calibrate the semantic similarity based on the number of terms annotated to genes. To assess the performance of our method, we applied HVSM to Homo sapiens and Saccharomyces cerevisiae protein-protein interaction datasets. Compared with TCSS, Resnik, and other classic similarity measures, HVSM achieved significant improvement for distinguishing positive from negative protein interactions. We also tested its correlation with sequence, EC, and Pfam similarity using online tool CESSM. HVSM showed an improvement of up to 4% compared to TCSS, 8% compared to IntelliGO, 12% compared to basic VSM, 6% compared to Resnik, 8% compared to Lin, 11% compared to Jiang, 8% compared to Schlicker, and 11% compared to SimGIC using AUC scores. CESSM test showed HVSM was comparable to SimGIC, and superior to all other similarity measures in CESSM as well as TCSS. Supplementary information and the software are available at https://github.com/kejia1215/HVSM .

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 266 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 266 100%

Demographic breakdown

Readers by professional status Count As %
Student > Master 50 19%
Student > Ph. D. Student 49 18%
Student > Bachelor 26 10%
Researcher 22 8%
Student > Doctoral Student 9 3%
Other 27 10%
Unknown 83 31%
Readers by discipline Count As %
Computer Science 107 40%
Engineering 25 9%
Biochemistry, Genetics and Molecular Biology 9 3%
Physics and Astronomy 5 2%
Neuroscience 4 2%
Other 22 8%
Unknown 94 35%