↓ Skip to main content

Imbalanced target prediction with pattern discovery on clinical data repositories

Overview of attention for article published in BMC Medical Informatics and Decision Making, April 2017
Altmetric Badge

Mentioned by

twitter
2 X users

Citations

dimensions_citation
12 Dimensions

Readers on

mendeley
34 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Imbalanced target prediction with pattern discovery on clinical data repositories
Published in
BMC Medical Informatics and Decision Making, April 2017
DOI 10.1186/s12911-017-0443-3
Pubmed ID
Authors

Tak-Ming Chan, Yuxi Li, Choo-Chiap Chiau, Jane Zhu, Jie Jiang, Yong Huo

Abstract

Clinical data repositories (CDR) have great potential to improve outcome prediction and risk modeling. However, most clinical studies require careful study design, dedicated data collection efforts, and sophisticated modeling techniques before a hypothesis can be tested. We aim to bridge this gap, so that clinical domain users can perform first-hand prediction on existing repository data without complicated handling, and obtain insightful patterns of imbalanced targets for a formal study before it is conducted. We specifically target for interpretability for domain users where the model can be conveniently explained and applied in clinical practice. We propose an interpretable pattern model which is noise (missing) tolerant for practice data. To address the challenge of imbalanced targets of interest in clinical research, e.g., deaths less than a few percent, the geometric mean of sensitivity and specificity (G-mean) optimization criterion is employed, with which a simple but effective heuristic algorithm is developed. We compared pattern discovery to clinically interpretable methods on two retrospective clinical datasets. They contain 14.9% deaths in 1 year in the thoracic dataset and 9.1% deaths in the cardiac dataset, respectively. In spite of the imbalance challenge shown on other methods, pattern discovery consistently shows competitive cross-validated prediction performance. Compared to logistic regression, Naïve Bayes, and decision tree, pattern discovery achieves statistically significant (p-values < 0.01, Wilcoxon signed rank test) favorable averaged testing G-means and F1-scores (harmonic mean of precision and sensitivity). Without requiring sophisticated technical processing of data and tweaking, the prediction performance of pattern discovery is consistently comparable to the best achievable performance. Pattern discovery has demonstrated to be robust and valuable for target prediction on existing clinical data repositories with imbalance and noise. The prediction results and interpretable patterns can provide insights in an agile and inexpensive way for the potential formal studies.

X Demographics

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 34 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 34 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 6 18%
Researcher 4 12%
Student > Bachelor 4 12%
Student > Doctoral Student 3 9%
Student > Postgraduate 3 9%
Other 9 26%
Unknown 5 15%
Readers by discipline Count As %
Computer Science 8 24%
Engineering 4 12%
Medicine and Dentistry 3 9%
Nursing and Health Professions 2 6%
Social Sciences 2 6%
Other 7 21%
Unknown 8 24%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 24 April 2017.
All research outputs
#17,887,790
of 22,965,074 outputs
Outputs from BMC Medical Informatics and Decision Making
#1,510
of 2,001 outputs
Outputs of similar age
#220,922
of 310,204 outputs
Outputs of similar age from BMC Medical Informatics and Decision Making
#26
of 34 outputs
Altmetric has tracked 22,965,074 research outputs across all sources so far. This one is in the 19th percentile – i.e., 19% of other outputs scored the same or lower than it.
So far Altmetric has tracked 2,001 research outputs from this source. They receive a mean Attention Score of 4.9. This one is in the 21st percentile – i.e., 21% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 310,204 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 24th percentile – i.e., 24% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 34 others from the same source and published within six weeks on either side of this one. This one is in the 14th percentile – i.e., 14% of its contemporaries scored the same or lower than it.