↓ Skip to main content

Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition

Overview of attention for article published in Journal of Biomedical Semantics, September 2016
Altmetric Badge

About this Attention Score

  • Average Attention Score compared to outputs of the same age
  • Average Attention Score compared to outputs of the same age and source

Mentioned by

twitter
4 X users

Citations

dimensions_citation
9 Dimensions

Readers on

mendeley
35 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition
Published in
Journal of Biomedical Semantics, September 2016
DOI 10.1186/s13326-016-0096-7
Pubmed ID
Authors

Christopher S. Funk, K. Bretonnel Cohen, Lawrence E. Hunter, Karin M. Verspoor

Abstract

Gene Ontology (GO) terms represent the standard for annotation and representation of molecular functions, biological processes and cellular compartments, but a large gap exists between the way concepts are represented in the ontology and how they are expressed in natural language text. The construction of highly specific GO terms is formulaic, consisting of parts and pieces from more simple terms. We present two different types of manually generated rules to help capture the variation of how GO terms can appear in natural language text. The first set of rules takes into account the compositional nature of GO and recursively decomposes the terms into their smallest constituent parts. The second set of rules generates derivational variations of these smaller terms and compositionally combines all generated variants to form the original term. By applying both types of rules, new synonyms are generated for two-thirds of all GO terms and an increase in F-measure performance for recognition of GO on the CRAFT corpus from 0.498 to 0.636 is observed. Additionally, we evaluated the combination of both types of rules over one million full text documents from Elsevier; manual validation and error analysis show we are able to recognize GO concepts with reasonable accuracy (88 %) based on random sampling of annotations. In this work we present a set of simple synonym generation rules that utilize the highly compositional and formulaic nature of the Gene Ontology concepts. We illustrate how the generated synonyms aid in improving recognition of GO concepts on two different biomedical corpora. We discuss other applications of our rules for GO ontology quality assurance, explore the issue of overgeneration, and provide examples of how similar methodologies could be applied to other biomedical terminologies. Additionally, we provide all generated synonyms for use by the text-mining community.

X Demographics

X Demographics

The data shown below were collected from the profiles of 4 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 35 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Netherlands 1 3%
Australia 1 3%
Unknown 33 94%

Demographic breakdown

Readers by professional status Count As %
Student > Master 7 20%
Researcher 7 20%
Student > Ph. D. Student 5 14%
Student > Doctoral Student 2 6%
Student > Postgraduate 2 6%
Other 5 14%
Unknown 7 20%
Readers by discipline Count As %
Computer Science 14 40%
Agricultural and Biological Sciences 4 11%
Biochemistry, Genetics and Molecular Biology 1 3%
Linguistics 1 3%
Nursing and Health Professions 1 3%
Other 5 14%
Unknown 9 26%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 24 May 2017.
All research outputs
#15,169,949
of 25,374,647 outputs
Outputs from Journal of Biomedical Semantics
#198
of 368 outputs
Outputs of similar age
#188,238
of 340,181 outputs
Outputs of similar age from Journal of Biomedical Semantics
#8
of 14 outputs
Altmetric has tracked 25,374,647 research outputs across all sources so far. This one is in the 38th percentile – i.e., 38% of other outputs scored the same or lower than it.
So far Altmetric has tracked 368 research outputs from this source. They receive a mean Attention Score of 4.6. This one is in the 43rd percentile – i.e., 43% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 340,181 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 43rd percentile – i.e., 43% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 14 others from the same source and published within six weeks on either side of this one. This one is in the 42nd percentile – i.e., 42% of its contemporaries scored the same or lower than it.