Title |
Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources
|
---|---|
Published in |
Journal of Biomedical Semantics, October 2013
|
DOI | 10.1186/2041-1480-4-28 |
Pubmed ID | |
Authors |
Dietrich Rebholz-Schuhmann, Senay Kafkas, Jee-Hyub Kim, Chen Li, Antonio Jimeno Yepes, Robert Hoehndorf, Rolf Backofen, Ian Lewin |
Abstract |
The identification of protein and gene names (PGNs) from the scientific literature requires semantic resources: Terminological and lexical resources deliver the term candidates into PGN tagging solutions and the gold standard corpora (GSC) train them to identify term parameters and contextual features. Ideally all three resources, i.e. corpora, lexica and taggers, cover the same domain knowledge, and thus support identification of the same types of PGNs and cover all of them. Unfortunately, none of the three serves as a predominant standard and for this reason it is worth exploring, how these three resources comply with each other. We systematically compare different PGN taggers against publicly available corpora and analyze the impact of the included lexical resource in their performance. In particular, we determine the performance gains through false positive filtering, which contributes to the disambiguation of identified PGNs. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
Spain | 1 | 50% |
Switzerland | 1 | 50% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 1 | 50% |
Members of the public | 1 | 50% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Mexico | 1 | 4% |
Croatia | 1 | 4% |
Netherlands | 1 | 4% |
Unknown | 24 | 89% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 7 | 26% |
Student > Bachelor | 4 | 15% |
Other | 4 | 15% |
Student > Ph. D. Student | 3 | 11% |
Student > Master | 3 | 11% |
Other | 3 | 11% |
Unknown | 3 | 11% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 9 | 33% |
Agricultural and Biological Sciences | 6 | 22% |
Engineering | 2 | 7% |
Medicine and Dentistry | 2 | 7% |
Psychology | 1 | 4% |
Other | 3 | 11% |
Unknown | 4 | 15% |