Report for: Automatic classification of diseases from free-text death certificates for real-time surveillance

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Automatic classification of diseases from free-text death certificates for real-time surveillance
Published in	BMC Medical Informatics and Decision Making, July 2015
DOI	10.1186/s12911-015-0174-2
Pubmed ID	26174442
Authors	Bevan Koopman, Sarvnaz Karimi, Anthony Nguyen, Rhydwyn McGuire, David Muscatello, Madonna Kemp, Donna Truran, Ming Zhang, Sarah Thackway
Abstract	Death certificates provide an invaluable source for mortality statistics which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies. However, their value can be realised only if accurate, quantitative data can be extracted from death certificates, an aim hampered by both the volume and variable nature of certificates written in natural language. This study aims to develop a set of machine learning and rule-based methods to automatically classify death certificates according to four high impact diseases of interest: diabetes, influenza, pneumonia and HIV. Two classification methods are presented: i) a machine learning approach, where detailed features (terms, term n-grams and SNOMED CT concepts) are extracted from death certificates and used to train a set of supervised machine learning models (Support Vector Machines); and ii) a set of keyword-matching rules. These methods were used to identify the presence of diabetes, influenza, pneumonia and HIV in a death certificate. An empirical evaluation was conducted using 340,142 death certificates, divided between training and test sets, covering deaths from 2000-2007 in New South Wales, Australia. Precision and recall (positive predictive value and sensitivity) were used as evaluation measures, with F-measure providing a single, overall measure of effectiveness. A detailed error analysis was performed on classification errors. Classification of diabetes, influenza, pneumonia and HIV was highly accurate (F-measure 0.96). More fine-grained ICD-10 classification effectiveness was more variable but still high (F-measure 0.80). The error analysis revealed that word variations as well as certain word combinations adversely affected classification. In addition, anomalies in the ground truth likely led to an underestimation of the effectiveness. The high accuracy and low cost of the classification methods allow for an effective means for automatic and real-time surveillance of diabetes, influenza, pneumonia and HIV deaths. In addition, the methods are generally applicable to other diseases of interest and to other sources of medical free-text besides death certificates.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United States	1	50%
India	1	50%

Demographic breakdown

Type	Count	As %
Scientists	1	50%
Practitioners (doctors, other healthcare professionals)	1	50%

Mendeley readers

The data shown below were compiled from readership statistics for 91 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
United States	1	1%
Switzerland	1	1%
Austria	1	1%
Unknown	88	97%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	19	21%
Student > Ph. D. Student	16	18%
Student > Master	12	13%
Other	7	8%
Student > Bachelor	5	5%
Other	14	15%
Unknown	18	20%

Readers by discipline	Count	As %
Computer Science	18	20%
Medicine and Dentistry	17	19%
Engineering	6	7%
Social Sciences	5	5%
Agricultural and Biological Sciences	3	3%
Other	17	19%
Unknown	25	27%

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 19 July 2015.

All research outputs

#14,231,810

of 22,817,213 outputs

Outputs from BMC Medical Informatics and Decision Making

#1,101

of 1,988 outputs

Outputs of similar age

#135,351

of 262,607 outputs

Outputs of similar age from BMC Medical Informatics and Decision Making

#23

of 37 outputs

Altmetric has tracked 22,817,213 research outputs across all sources so far. This one is in the 35th percentile – i.e., 35% of other outputs scored the same or lower than it.

So far Altmetric has tracked 1,988 research outputs from this source. They receive a mean Attention Score of 4.9. This one is in the 38th percentile – i.e., 38% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 262,607 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 45th percentile – i.e., 45% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 37 others from the same source and published within six weeks on either side of this one. This one is in the 27th percentile – i.e., 27% of its contemporaries scored the same or lower than it.

Automatic classification of diseases from free-text death certificates for real-time surveillance

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context