↓ Skip to main content

Parsing clinical text: how good are the state-of-the-art parsers?

Overview of attention for article published in BMC Medical Informatics and Decision Making, May 2015
Altmetric Badge

About this Attention Score

  • Average Attention Score compared to outputs of the same age

Mentioned by

twitter
2 X users

Readers on

mendeley
55 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Parsing clinical text: how good are the state-of-the-art parsers?
Published in
BMC Medical Informatics and Decision Making, May 2015
DOI 10.1186/1472-6947-15-s1-s2
Pubmed ID
Authors

Min Jiang, Yang Huang, Jung-wei Fan, Buzhou Tang, Josh Denny, Hua Xu

Abstract

Parsing, which generates a syntactic structure of a sentence (a parse tree), is a critical component of natural language processing (NLP) research in any domain including medicine. Although parsers developed in the general English domain, such as the Stanford parser, have been applied to clinical text, there are no formal evaluations and comparisons of their performance in the medical domain. In this study, we investigated the performance of three state-of-the-art parsers: the Stanford parser, the Bikel parser, and the Charniak parser, using following two datasets: (1) A Treebank containing 1,100 sentences that were randomly selected from progress notes used in the 2010 i2b2 NLP challenge and manually annotated according to a Penn Treebank based guideline; and (2) the MiPACQ Treebank, which is developed based on pathology notes and clinical notes, containing 13,091 sentences. We conducted three experiments on both datasets. First, we measured the performance of the three state-of-the-art parsers on the clinical Treebanks with their default settings. Then we re-trained the parsers using the clinical Treebanks and evaluated their performance using the 10-fold cross validation method. Finally we re-trained the parsers by combining the clinical Treebanks with the Penn Treebank. Our results showed that the original parsers achieved lower performance in clinical text (Bracketing F-measure in the range of 66.6%-70.3%) compared to general English text. After retraining on the clinical Treebank, all parsers achieved better performance, with the best performance from the Stanford parser that reached the highest Bracketing F-measure of 73.68% on progress notes and 83.72% on the MiPACQ corpus using 10-fold cross validation. When the combined clinical Treebanks and Penn Treebank was used, of the three parsers, the Charniak parser achieved the highest Bracketing F-measure of 73.53% on progress notes and the Stanford parser reached the highest F-measure of 84.15% on the MiPACQ corpus. Our study demonstrates that re-training using clinical Treebanks is critical for improving general English parsers' performance on clinical text, and combining clinical and open domain corpora might achieve optimal performance for parsing clinical text.

X Demographics

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 55 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 3 5%
Unknown 52 95%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 14 25%
Researcher 13 24%
Student > Master 5 9%
Student > Doctoral Student 4 7%
Lecturer 2 4%
Other 9 16%
Unknown 8 15%
Readers by discipline Count As %
Computer Science 22 40%
Medicine and Dentistry 12 22%
Linguistics 6 11%
Engineering 2 4%
Pharmacology, Toxicology and Pharmaceutical Science 1 2%
Other 4 7%
Unknown 8 15%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 08 June 2015.
All research outputs
#15,333,503
of 22,805,349 outputs
Outputs from BMC Medical Informatics and Decision Making
#1,312
of 1,988 outputs
Outputs of similar age
#156,755
of 266,611 outputs
Outputs of similar age from BMC Medical Informatics and Decision Making
#32
of 43 outputs
Altmetric has tracked 22,805,349 research outputs across all sources so far. This one is in the 22nd percentile – i.e., 22% of other outputs scored the same or lower than it.
So far Altmetric has tracked 1,988 research outputs from this source. They receive a mean Attention Score of 4.9. This one is in the 24th percentile – i.e., 24% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 266,611 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 32nd percentile – i.e., 32% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 43 others from the same source and published within six weeks on either side of this one. This one is in the 20th percentile – i.e., 20% of its contemporaries scored the same or lower than it.