Title |
Constructing a semantic predication gold standard from the biomedical literature
|
---|---|
Published in |
BMC Bioinformatics, December 2011
|
DOI | 10.1186/1471-2105-12-486 |
Pubmed ID | |
Authors |
Halil Kilicoglu, Graciela Rosemblat, Marcelo Fiszman, Thomas C Rindflesch |
Abstract |
Semantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biomedical text mining focus on narrow subdomains and adopt different semantic representations, rendering them difficult to use for benchmarking independently developed relation extraction systems. In this article, we present a multi-phase gold standard annotation study, in which we annotated 500 sentences randomly selected from MEDLINE abstracts on a wide range of biomedical topics with 1371 semantic predications. The UMLS Metathesaurus served as the main source for conceptual information and the UMLS Semantic Network for relational information. We measured interannotator agreement and analyzed the annotations closely to identify some of the challenges in annotating biomedical text with relations based on an ontology or a terminology. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 2 | 67% |
Japan | 1 | 33% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 2 | 67% |
Scientists | 1 | 33% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 5 | 5% |
Netherlands | 2 | 2% |
United Kingdom | 1 | <1% |
Brazil | 1 | <1% |
Russia | 1 | <1% |
Mexico | 1 | <1% |
Unknown | 96 | 90% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 19 | 18% |
Researcher | 19 | 18% |
Student > Master | 14 | 13% |
Other | 9 | 8% |
Student > Postgraduate | 9 | 8% |
Other | 22 | 21% |
Unknown | 15 | 14% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 39 | 36% |
Agricultural and Biological Sciences | 16 | 15% |
Medicine and Dentistry | 11 | 10% |
Linguistics | 7 | 7% |
Biochemistry, Genetics and Molecular Biology | 4 | 4% |
Other | 12 | 11% |
Unknown | 18 | 17% |