Title |
An active learning based classification strategy for the minority class problem: application to histopathology annotation
|
---|---|
Published in |
BMC Bioinformatics, October 2011
|
DOI | 10.1186/1471-2105-12-424 |
Pubmed ID | |
Authors |
Scott Doyle, James Monaco, Michael Feldman, John Tomaszewski, Anant Madabhushi |
Abstract |
Supervised classifiers for digital pathology can improve the ability of physicians to detect and diagnose diseases such as cancer. Generating training data for classifiers is problematic, since only domain experts (e.g. pathologists) can correctly label ground truth data. Additionally, digital pathology datasets suffer from the "minority class problem", an issue where the number of exemplars from the non-target class outnumber target class exemplars which can bias the classifier and reduce accuracy. In this paper, we develop a training strategy combining active learning (AL) with class-balancing. AL identifies unlabeled samples that are "informative" (i.e. likely to increase classifier performance) for annotation, avoiding non-informative samples. This yields high accuracy with a smaller training set size compared with random learning (RL). Previous AL methods have not explicitly accounted for the minority class problem in biomedical images. Pre-specifying a target class ratio mitigates the problem of training bias. Finally, we develop a mathematical model to predict the number of annotations (cost) required to achieve balanced training classes. In addition to predicting training cost, the model reveals the theoretical properties of AL in the context of the minority class problem. |
Twitter Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
Unknown | 1 | 100% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 1 | 100% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Brazil | 4 | 4% |
Colombia | 2 | 2% |
Germany | 1 | 1% |
Netherlands | 1 | 1% |
France | 1 | 1% |
Sweden | 1 | 1% |
United Kingdom | 1 | 1% |
United States | 1 | 1% |
Unknown | 83 | 87% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 19 | 20% |
Researcher | 17 | 18% |
Student > Bachelor | 10 | 11% |
Student > Master | 9 | 9% |
Other | 7 | 7% |
Other | 23 | 24% |
Unknown | 10 | 11% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 31 | 33% |
Medicine and Dentistry | 15 | 16% |
Engineering | 12 | 13% |
Agricultural and Biological Sciences | 9 | 9% |
Chemistry | 6 | 6% |
Other | 11 | 12% |
Unknown | 11 | 12% |