Title |
Pooling annotated corpora for clinical concept extraction
|
---|---|
Published in |
Journal of Biomedical Semantics, January 2013
|
DOI | 10.1186/2041-1480-4-3 |
Pubmed ID | |
Authors |
Kavishwar B Wagholikar, Manabu Torii, Siddhartha R Jonnalagadda, Hongfang Liu |
Abstract |
The availability of annotated corpora has facilitated the application of machine learning algorithms to concept extraction from clinical notes. However, high expenditure and labor are required for creating the annotations. A potential alternative is to reuse existing corpora from other institutions by pooling with local corpora, for training machine taggers. In this paper we have investigated the latter approach by pooling corpora from 2010 i2b2/VA NLP challenge and Mayo Clinic Rochester, to evaluate taggers for recognition of medical problems. The corpora were annotated for medical problems, but with different guidelines. The taggers were constructed using an existing tagging system MedTagger that consisted of dictionary lookup, part of speech (POS) tagging and machine learning for named entity prediction and concept extraction. We hope that our current work will be a useful case study for facilitating reuse of annotated corpora across institutions. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United Kingdom | 1 | 50% |
Unknown | 1 | 50% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 2 | 100% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 2 | 7% |
Netherlands | 1 | 3% |
Slovenia | 1 | 3% |
Unknown | 26 | 87% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 12 | 40% |
Student > Ph. D. Student | 6 | 20% |
Other | 3 | 10% |
Professor > Associate Professor | 3 | 10% |
Student > Master | 2 | 7% |
Other | 1 | 3% |
Unknown | 3 | 10% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 14 | 47% |
Linguistics | 3 | 10% |
Agricultural and Biological Sciences | 3 | 10% |
Physics and Astronomy | 2 | 7% |
Medicine and Dentistry | 2 | 7% |
Other | 3 | 10% |
Unknown | 3 | 10% |