↓ Skip to main content

Temporal bone radiology report classification using open source machine learning and natural langue processing libraries

Overview of attention for article published in BMC Medical Informatics and Decision Making, June 2016
Altmetric Badge

About this Attention Score

  • Good Attention Score compared to outputs of the same age (70th percentile)
  • Good Attention Score compared to outputs of the same age and source (69th percentile)

Mentioned by

twitter
6 X users

Citations

dimensions_citation
17 Dimensions

Readers on

mendeley
71 Mendeley
citeulike
1 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Temporal bone radiology report classification using open source machine learning and natural langue processing libraries
Published in
BMC Medical Informatics and Decision Making, June 2016
DOI 10.1186/s12911-016-0306-3
Pubmed ID
Authors

Aaron J. Masino, Robert W. Grundmeier, Jeffrey W. Pennington, John A. Germiller, E. Bryan Crenshaw

Abstract

Radiology reports are a rich resource for biomedical research. Prior to utilization, trained experts must manually review reports to identify discrete outcomes. The Audiological and Genetic Database (AudGenDB) is a public, de-identified research database that contains over 16,000 radiology reports. Because the reports are unlabeled, it is difficult to select those with specific abnormalities. We implemented a classification pipeline using a human-in-the-loop machine learning approach and open source libraries to label the reports with one or more of four abnormality region labels: inner, middle, outer, and mastoid, indicating the presence of an abnormality in the specified ear region. Trained abstractors labeled radiology reports taken from AudGenDB to form a gold standard. These were split into training (80 %) and test (20 %) sets. We applied open source libraries to normalize and convert every report to an n-gram feature vector. We trained logistic regression, support vector machine (linear and Gaussian), decision tree, random forest, and naïve Bayes models for each ear region. The models were evaluated on the hold-out test set. Our gold-standard data set contained 726 reports. The best classifiers were linear support vector machine for inner and outer ear, logistic regression for middle ear, and decision tree for mastoid. Classifier test set accuracy was 90 %, 90 %, 93 %, and 82 % for the inner, middle, outer and mastoid regions, respectively. The logistic regression method was very consistent, achieving accuracy scores within 2.75 % of the best classifier across regions and a receiver operator characteristic area under the curve of 0.92 or greater across all regions. Our results indicate that the applied methods achieve accuracy scores sufficient to support our objective of extracting discrete features from radiology reports to enhance cohort identification in AudGenDB. The models described here are available in several free, open source libraries that make them more accessible and simplify their utilization as demonstrated in this work. We additionally implemented the models as a web service that accepts radiology report text in an HTTP request and provides the predicted region labels. This service has been used to label the reports in AudGenDB and is freely available.

X Demographics

X Demographics

The data shown below were collected from the profiles of 6 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 71 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 71 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 13 18%
Other 7 10%
Student > Master 7 10%
Student > Bachelor 7 10%
Student > Doctoral Student 6 8%
Other 14 20%
Unknown 17 24%
Readers by discipline Count As %
Medicine and Dentistry 20 28%
Computer Science 10 14%
Nursing and Health Professions 4 6%
Business, Management and Accounting 3 4%
Engineering 3 4%
Other 9 13%
Unknown 22 31%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 5. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 07 June 2016.
All research outputs
#6,151,614
of 22,876,619 outputs
Outputs from BMC Medical Informatics and Decision Making
#553
of 1,993 outputs
Outputs of similar age
#99,022
of 340,764 outputs
Outputs of similar age from BMC Medical Informatics and Decision Making
#10
of 33 outputs
Altmetric has tracked 22,876,619 research outputs across all sources so far. This one has received more attention than most of these and is in the 73rd percentile.
So far Altmetric has tracked 1,993 research outputs from this source. They receive a mean Attention Score of 4.9. This one has gotten more attention than average, scoring higher than 71% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 340,764 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 70% of its contemporaries.
We're also able to compare this research output to 33 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 69% of its contemporaries.