↓ Skip to main content

Detection of sentence boundaries and abbreviations in clinical narratives

Overview of attention for article published in BMC Medical Informatics and Decision Making, June 2015
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
32 Dimensions

Readers on

mendeley
67 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Detection of sentence boundaries and abbreviations in clinical narratives
Published in
BMC Medical Informatics and Decision Making, June 2015
DOI 10.1186/1472-6947-15-s2-s4
Pubmed ID
Authors

Markus Kreuzthaler, Stefan Schulz

Abstract

In Western languages the period character is highly ambiguous, due to its double role as sentence delimiter and abbreviation marker. This is particularly relevant in clinical free-texts characterized by numerous anomalies in spelling, punctuation, vocabulary and with a high frequency of short forms. The problem is addressed by two binary classifiers for abbreviation and sentence detection. A support vector machine exploiting a linear kernel is trained on different combinations of feature sets for each classification task. Feature relevance ranking is applied to investigate which features are important for the particular task. The methods are applied to German language texts from a medical record system, authored by specialized physicians. Two collections of 3,024 text snippets were annotated regarding the role of period characters for training and testing. Cohen's kappa resulted in 0.98. For abbreviation and sentence boundary detection we can report an unweighted micro-averaged F-measure using a 10-fold cross validation of 0.97 for the training set. For test set based evaluation we obtained an unweighted micro-averaged F-measure of 0.95 for abbreviation detection and 0.94 for sentence delineation. Language-dependent resources and rules were found to have less impact on abbreviation detection than on sentence delineation. Sentence detection is an important task, which should be performed at the beginning of a text processing pipeline. For the text genre under scrutiny we showed that support vector machines exploiting a linear kernel produce state of the art results for sentence boundary detection. The results are comparable with other sentence boundary detection methods applied to English clinical texts. We identified abbreviation detection as a supportive task for sentence delineation.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 67 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 1 1%
Austria 1 1%
Unknown 65 97%

Demographic breakdown

Readers by professional status Count As %
Researcher 18 27%
Student > Master 10 15%
Student > Ph. D. Student 9 13%
Student > Doctoral Student 6 9%
Student > Bachelor 5 7%
Other 10 15%
Unknown 9 13%
Readers by discipline Count As %
Computer Science 25 37%
Medicine and Dentistry 13 19%
Linguistics 6 9%
Engineering 4 6%
Business, Management and Accounting 1 1%
Other 7 10%
Unknown 11 16%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 23 June 2015.
All research outputs
#18,417,643
of 22,815,414 outputs
Outputs from BMC Medical Informatics and Decision Making
#1,570
of 1,988 outputs
Outputs of similar age
#190,060
of 264,250 outputs
Outputs of similar age from BMC Medical Informatics and Decision Making
#29
of 37 outputs
Altmetric has tracked 22,815,414 research outputs across all sources so far. This one is in the 11th percentile – i.e., 11% of other outputs scored the same or lower than it.
So far Altmetric has tracked 1,988 research outputs from this source. They receive a mean Attention Score of 4.9. This one is in the 9th percentile – i.e., 9% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 264,250 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 16th percentile – i.e., 16% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 37 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.