Report for: Temporal bone radiology report classification using open source machine learning and natural langue processing libraries

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Temporal bone radiology report classification using open source machine learning and natural langue processing libraries
Published in	BMC Medical Informatics and Decision Making, June 2016
DOI	10.1186/s12911-016-0306-3
Pubmed ID	27267768
Authors	Aaron J. Masino, Robert W. Grundmeier, Jeffrey W. Pennington, John A. Germiller, E. Bryan Crenshaw
Abstract	Radiology reports are a rich resource for biomedical research. Prior to utilization, trained experts must manually review reports to identify discrete outcomes. The Audiological and Genetic Database (AudGenDB) is a public, de-identified research database that contains over 16,000 radiology reports. Because the reports are unlabeled, it is difficult to select those with specific abnormalities. We implemented a classification pipeline using a human-in-the-loop machine learning approach and open source libraries to label the reports with one or more of four abnormality region labels: inner, middle, outer, and mastoid, indicating the presence of an abnormality in the specified ear region. Trained abstractors labeled radiology reports taken from AudGenDB to form a gold standard. These were split into training (80 %) and test (20 %) sets. We applied open source libraries to normalize and convert every report to an n-gram feature vector. We trained logistic regression, support vector machine (linear and Gaussian), decision tree, random forest, and naïve Bayes models for each ear region. The models were evaluated on the hold-out test set. Our gold-standard data set contained 726 reports. The best classifiers were linear support vector machine for inner and outer ear, logistic regression for middle ear, and decision tree for mastoid. Classifier test set accuracy was 90 %, 90 %, 93 %, and 82 % for the inner, middle, outer and mastoid regions, respectively. The logistic regression method was very consistent, achieving accuracy scores within 2.75 % of the best classifier across regions and a receiver operator characteristic area under the curve of 0.92 or greater across all regions. Our results indicate that the applied methods achieve accuracy scores sufficient to support our objective of extracting discrete features from radiology reports to enhance cohort identification in AudGenDB. The models described here are available in several free, open source libraries that make them more accessible and simplify their utilization as demonstrated in this work. We additionally implemented the models as a web service that accepts radiology report text in an HTTP request and provides the predicted region labels. This service has been used to label the reports in AudGenDB and is freely available.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 6 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
Iran, Islamic Republic of	2	33%
United Kingdom	1	17%
United States	1	17%
Unknown	2	33%

Demographic breakdown

Type	Count	As %
Members of the public	6	100%

Mendeley readers

The data shown below were compiled from readership statistics for 71 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	71	100%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	13	18%
Other	7	10%
Student > Master	7	10%
Student > Bachelor	7	10%
Student > Doctoral Student	6	8%
Other	14	20%
Unknown	17	24%

Readers by discipline	Count	As %
Medicine and Dentistry	20	28%
Computer Science	10	14%
Nursing and Health Professions	4	6%
Business, Management and Accounting	3	4%
Engineering	3	4%
Other	9	13%
Unknown	22	31%

Attention Score in Context

This research output has an Altmetric Attention Score of 5. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 07 June 2016.

All research outputs

#6,151,614

of 22,876,619 outputs

Outputs from BMC Medical Informatics and Decision Making

#553

of 1,993 outputs

Outputs of similar age

#99,022

of 340,764 outputs

Outputs of similar age from BMC Medical Informatics and Decision Making

#10

of 33 outputs

Altmetric has tracked 22,876,619 research outputs across all sources so far. This one has received more attention than most of these and is in the 73rd percentile.

So far Altmetric has tracked 1,993 research outputs from this source. They receive a mean Attention Score of 4.9. This one has gotten more attention than average, scoring higher than 71% of its peers.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 340,764 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 70% of its contemporaries.

We're also able to compare this research output to 33 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 69% of its contemporaries.

Temporal bone radiology report classification using open source machine learning and natural langue processing libraries

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context