↓ Skip to main content

Optimizing graph-based patterns to extract biomedical events from the literature

Overview of attention for article published in BMC Bioinformatics, October 2015
Altmetric Badge

Citations

dimensions_citation
9 Dimensions

Readers on

mendeley
26 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Optimizing graph-based patterns to extract biomedical events from the literature
Published in
BMC Bioinformatics, October 2015
DOI 10.1186/1471-2105-16-s16-s2
Pubmed ID
Authors

Haibin Liu, Karin Verspoor, Donald C Comeau, Andrew D MacKinlay, W John Wilbur

Abstract

IN BIONLP-ST 2013: We participated in the BioNLP 2013 shared tasks on event extraction. Our extraction method is based on the search for an approximate subgraph isomorphism between key context dependencies of events and graphs of input sentences. Our system was able to address both the GENIA (GE) task focusing on 13 molecular biology related event types and the Cancer Genetics (CG) task targeting a challenging group of 40 cancer biology related event types with varying arguments concerning 18 kinds of biological entities. In addition to adapting our system to the two tasks, we also attempted to integrate semantics into the graph matching scheme using a distributional similarity model for more events, and evaluated the event extraction impact of using paths of all possible lengths as key context dependencies beyond using only the shortest paths in our system. We achieved a 46.38% F-score in the CG task (ranking 3rd) and a 48.93% F-score in the GE task (ranking 4th). AFTER BIONLP-ST 2013: We explored three ways to further extend our event extraction system in our previously published work: (1) We allow non-essential nodes to be skipped, and incorporated a node skipping penalty into the subgraph distance function of our approximate subgraph matching algorithm. (2) Instead of assigning a unified subgraph distance threshold to all patterns of an event type, we learned a customized threshold for each pattern. (3) We implemented the well-known Empirical Risk Minimization (ERM) principle to optimize the event pattern set by balancing prediction errors on training data against regularization. When evaluated on the official GE task test data, these extensions help to improve the extraction precision from 62% to 65%. However, the overall F-score stays equivalent to the previous performance due to a 1% drop in recall.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 26 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Australia 2 8%
Unknown 24 92%

Demographic breakdown

Readers by professional status Count As %
Researcher 6 23%
Student > Master 6 23%
Student > Bachelor 3 12%
Professor 2 8%
Student > Doctoral Student 2 8%
Other 6 23%
Unknown 1 4%
Readers by discipline Count As %
Computer Science 12 46%
Medicine and Dentistry 8 31%
Agricultural and Biological Sciences 1 4%
Nursing and Health Professions 1 4%
Decision Sciences 1 4%
Other 1 4%
Unknown 2 8%