Report for: Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles
Published in	BMC Bioinformatics, August 2017
DOI	10.1186/s12859-017-1775-9
Pubmed ID	28818042
Authors	K. Bretonnel Cohen, Arrick Lanfranchi, Miji Joo-young Choi, Michael Bada, William A. Baumgartner, Natalya Panteleyeva, Karin Verspoor, Martha Palmer, Lawrence E. Hunter
Abstract	Coreference resolution is the task of finding strings in text that have the same referent as other strings. Failures of coreference resolution are a common cause of false negatives in information extraction from the scientific literature. In order to better understand the nature of the phenomenon of coreference in biomedical publications and to increase performance on the task, we annotated the Colorado Richly Annotated Full Text (CRAFT) corpus with coreference relations. The corpus was manually annotated with coreference relations, including identity and appositives for all coreferring base noun phrases. The OntoNotes annotation guidelines, with minor adaptations, were used. Interannotator agreement ranges from 0.480 (entity-based CEAF) to 0.858 (Class-B3), depending on the metric that is used to assess it. The resulting corpus adds nearly 30,000 annotations to the previous release of the CRAFT corpus. Differences from related projects include a much broader definition of markables, connection to extensive annotation of several domain-relevant semantic classes, and connection to complete syntactic annotation. Tool performance was benchmarked on the data. A publicly available out-of-the-box, general-domain coreference resolution system achieved an F-measure of 0.14 (B3), while a simple domain-adapted rule-based system achieved an F-measure of 0.42. An ensemble of the two reached F of 0.46. Following the IDENTITY chains in the data would add 106,263 additional named entities in the full 97-paper corpus, for an increase of 76% percent in the semantic classes of the eight ontologies that have been annotated in earlier versions of the CRAFT corpus. The project produced a large data set for further investigation of coreference and coreference resolution in the scientific literature. The work raised issues in the phenomenon of reference in this domain and genre, and the paper proposes that many mentions that would be considered generic in the general domain are not generic in the biomedical domain due to their referents to specific classes in domain-specific ontologies. The comparison of the performance of a publicly available and well-understood coreference resolution system with a domain-adapted system produced results that are consistent with the notion that the requirements for successful coreference resolution in this genre are quite different from those of the general domain, and also suggest that the baseline performance difference is quite large.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 5 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
Switzerland	1	20%
United States	1	20%
Australia	1	20%
Unknown	2	40%

Demographic breakdown

Type	Count	As %
Members of the public	3	60%
Scientists	2	40%

Mendeley readers

The data shown below were compiled from readership statistics for 32 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	32	100%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	7	22%
Student > Master	6	19%
Student > Ph. D. Student	3	9%
Student > Doctoral Student	2	6%
Professor	2	6%
Other	2	6%
Unknown	10	31%

Readers by discipline	Count	As %
Computer Science	12	38%
Linguistics	4	13%
Agricultural and Biological Sciences	3	9%
Business, Management and Accounting	1	3%
Medicine and Dentistry	1	3%
Other	1	3%
Unknown	10	31%

Attention Score in Context

This research output has an Altmetric Attention Score of 3. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 22 August 2017.

All research outputs

#13,520,363

of 23,327,904 outputs

Outputs from BMC Bioinformatics

#4,093

of 7,386 outputs

Outputs of similar age

#157,327

of 319,630 outputs

Outputs of similar age from BMC Bioinformatics

#48

of 86 outputs

Altmetric has tracked 23,327,904 research outputs across all sources so far. This one is in the 41st percentile – i.e., 41% of other outputs scored the same or lower than it.

So far Altmetric has tracked 7,386 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one is in the 42nd percentile – i.e., 42% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 319,630 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 49th percentile – i.e., 49% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 86 others from the same source and published within six weeks on either side of this one. This one is in the 44th percentile – i.e., 44% of its contemporaries scored the same or lower than it.

Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context