↓ Skip to main content

The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities

Overview of attention for article published in BMC Bioinformatics, June 2015
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
12 Dimensions

Readers on

mendeley
19 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities
Published in
BMC Bioinformatics, June 2015
DOI 10.1186/1471-2105-16-s10-s6
Pubmed ID
Authors

Thomas Lavergne, Cyril Grouin, Pierre Zweigenbaum

Abstract

The acquisition of knowledge about relations between bacteria and their locations (habitats and geographical locations) in short texts about bacteria, as defined in the BioNLP-ST 2013 Bacteria Biotope task, depends on the detection of co-reference links between mentions of entities of each of these three types. To our knowledge, no participant in this task has investigated this aspect of the situation. The present work specifically addresses issues raised by this situation: (i) how to detect these co-reference links and associated co-reference chains; (ii) how to use them to prepare positive and negative examples to train a supervised system for the detection of relations between entity mentions; (iii) what context around which entity mentions contributes to relation detection when co-reference chains are provided. We present experiments and results obtained both with gold entity mentions (task 2 of BioNLP-ST 2013) and with automatically detected entity mentions (end-to-end system, in task 3 of BioNLP-ST 2013). Our supervised mention detection system uses a linear chain Conditional Random Fields classifier, and our relation detection system relies on a Logistic Regression (aka Maximum Entropy) classifier. They use a set of morphological, morphosyntactic and semantic features. To minimize false inferences, co-reference resolution applies a set of heuristic rules designed to optimize precision. They take into account the types of the detected entity mentions, and take advantage of the didactic nature of the texts of the corpus, where a large proportion of bacteria naming is fairly explicit (although natural referring expressions such as "the bacteria" are common). The resulting system achieved a 0.495 F-measure on the official test set when taking as input the gold entity mentions, and a 0.351 F-measure when taking as input entity mentions predicted by our CRF system, both of which are above the best BioNLP-ST 2013 participant system. We show that co-reference resolution substantially improves over a baseline system which does not use co-reference information: about 3.5 F-measure points on the test corpus for the end-to-end system (5.5 points on the development corpus) and 7 F-measure points on both development and test corpora when gold mentions are used. While this outperforms the best published system on the BioNLP-ST 2013 Bacteria Biotope dataset, we consider that it provides mostly a stronger baseline from which more work can be started. We also emphasize the importance and difficulty of designing a comprehensive gold standard co-reference annotation, which we explain is a key point to further progress on the task.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 19 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 19 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 5 26%
Student > Master 4 21%
Student > Ph. D. Student 3 16%
Professor 2 11%
Professor > Associate Professor 2 11%
Other 0 0%
Unknown 3 16%
Readers by discipline Count As %
Computer Science 5 26%
Medicine and Dentistry 2 11%
Economics, Econometrics and Finance 2 11%
Biochemistry, Genetics and Molecular Biology 1 5%
Agricultural and Biological Sciences 1 5%
Other 3 16%
Unknown 5 26%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 24 July 2015.
All research outputs
#18,418,919
of 22,817,213 outputs
Outputs from BMC Bioinformatics
#6,314
of 7,284 outputs
Outputs of similar age
#189,577
of 263,947 outputs
Outputs of similar age from BMC Bioinformatics
#98
of 109 outputs
Altmetric has tracked 22,817,213 research outputs across all sources so far. This one is in the 11th percentile – i.e., 11% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,284 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 5th percentile – i.e., 5% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 263,947 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 16th percentile – i.e., 16% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 109 others from the same source and published within six weeks on either side of this one. This one is in the 6th percentile – i.e., 6% of its contemporaries scored the same or lower than it.