↓ Skip to main content

Automatic classification of diseases from free-text death certificates for real-time surveillance

Overview of attention for article published in BMC Medical Informatics and Decision Making, July 2015
Altmetric Badge

About this Attention Score

  • Average Attention Score compared to outputs of the same age

Mentioned by

twitter
2 X users

Citations

dimensions_citation
45 Dimensions

Readers on

mendeley
91 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Automatic classification of diseases from free-text death certificates for real-time surveillance
Published in
BMC Medical Informatics and Decision Making, July 2015
DOI 10.1186/s12911-015-0174-2
Pubmed ID
Authors

Bevan Koopman, Sarvnaz Karimi, Anthony Nguyen, Rhydwyn McGuire, David Muscatello, Madonna Kemp, Donna Truran, Ming Zhang, Sarah Thackway

Abstract

Death certificates provide an invaluable source for mortality statistics which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies. However, their value can be realised only if accurate, quantitative data can be extracted from death certificates, an aim hampered by both the volume and variable nature of certificates written in natural language. This study aims to develop a set of machine learning and rule-based methods to automatically classify death certificates according to four high impact diseases of interest: diabetes, influenza, pneumonia and HIV. Two classification methods are presented: i) a machine learning approach, where detailed features (terms, term n-grams and SNOMED CT concepts) are extracted from death certificates and used to train a set of supervised machine learning models (Support Vector Machines); and ii) a set of keyword-matching rules. These methods were used to identify the presence of diabetes, influenza, pneumonia and HIV in a death certificate. An empirical evaluation was conducted using 340,142 death certificates, divided between training and test sets, covering deaths from 2000-2007 in New South Wales, Australia. Precision and recall (positive predictive value and sensitivity) were used as evaluation measures, with F-measure providing a single, overall measure of effectiveness. A detailed error analysis was performed on classification errors. Classification of diabetes, influenza, pneumonia and HIV was highly accurate (F-measure 0.96). More fine-grained ICD-10 classification effectiveness was more variable but still high (F-measure 0.80). The error analysis revealed that word variations as well as certain word combinations adversely affected classification. In addition, anomalies in the ground truth likely led to an underestimation of the effectiveness. The high accuracy and low cost of the classification methods allow for an effective means for automatic and real-time surveillance of diabetes, influenza, pneumonia and HIV deaths. In addition, the methods are generally applicable to other diseases of interest and to other sources of medical free-text besides death certificates.

X Demographics

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 91 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 1 1%
Switzerland 1 1%
Austria 1 1%
Unknown 88 97%

Demographic breakdown

Readers by professional status Count As %
Researcher 19 21%
Student > Ph. D. Student 16 18%
Student > Master 12 13%
Other 7 8%
Student > Bachelor 5 5%
Other 14 15%
Unknown 18 20%
Readers by discipline Count As %
Computer Science 18 20%
Medicine and Dentistry 17 19%
Engineering 6 7%
Social Sciences 5 5%
Agricultural and Biological Sciences 3 3%
Other 17 19%
Unknown 25 27%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 19 July 2015.
All research outputs
#14,231,810
of 22,817,213 outputs
Outputs from BMC Medical Informatics and Decision Making
#1,101
of 1,988 outputs
Outputs of similar age
#135,351
of 262,607 outputs
Outputs of similar age from BMC Medical Informatics and Decision Making
#23
of 37 outputs
Altmetric has tracked 22,817,213 research outputs across all sources so far. This one is in the 35th percentile – i.e., 35% of other outputs scored the same or lower than it.
So far Altmetric has tracked 1,988 research outputs from this source. They receive a mean Attention Score of 4.9. This one is in the 38th percentile – i.e., 38% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 262,607 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 45th percentile – i.e., 45% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 37 others from the same source and published within six weeks on either side of this one. This one is in the 27th percentile – i.e., 27% of its contemporaries scored the same or lower than it.