Report for: A robust data scaling algorithm to improve classification accuracies in biomedical data

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	A robust data scaling algorithm to improve classification accuracies in biomedical data
Published in	BMC Bioinformatics, September 2016
DOI	10.1186/s12859-016-1236-x
Pubmed ID	27612635
Authors	Xi Hang Cao, Ivan Stojkovic, Zoran Obradovic
Abstract	Machine learning models have been adapted in biomedical research and practice for knowledge discovery and decision support. While mainstream biomedical informatics research focuses on developing more accurate models, the importance of data preprocessing draws less attention. We propose the Generalized Logistic (GL) algorithm that scales data uniformly to an appropriate interval by learning a generalized logistic function to fit the empirical cumulative distribution function of the data. The GL algorithm is simple yet effective; it is intrinsically robust to outliers, so it is particularly suitable for diagnostic/classification models in clinical/medical applications where the number of samples is usually small; it scales the data in a nonlinear fashion, which leads to potential improvement in accuracy. To evaluate the effectiveness of the proposed algorithm, we conducted experiments on 16 binary classification tasks with different variable types and cover a wide range of applications. The resultant performance in terms of area under the receiver operation characteristic curve (AUROC) and percentage of correct classification showed that models learned using data scaled by the GL algorithm outperform the ones using data scaled by the Min-max and the Z-score algorithm, which are the most commonly used data scaling algorithms. The proposed GL algorithm is simple and effective. It is robust to outliers, so no additional denoising or outlier detection step is needed in data preprocessing. Empirical results also show models learned from data scaled by the GL algorithm have higher accuracy compared to the commonly used data scaling algorithms.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 11 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
Singapore	1	9%
Germany	1	9%
United States	1	9%
Spain	1	9%
Unknown	7	64%

Demographic breakdown

Type	Count	As %
Scientists	8	73%
Members of the public	3	27%

Mendeley readers

The data shown below were compiled from readership statistics for 181 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
France	2	1%
Brazil	1	<1%
United Kingdom	1	<1%
Canada	1	<1%
Russia	1	<1%
Unknown	175	97%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	37	20%
Student > Master	29	16%
Student > Bachelor	25	14%
Researcher	17	9%
Student > Doctoral Student	7	4%
Other	13	7%
Unknown	53	29%

Readers by discipline	Count	As %
Computer Science	33	18%
Engineering	29	16%
Agricultural and Biological Sciences	11	6%
Biochemistry, Genetics and Molecular Biology	9	5%
Medicine and Dentistry	8	4%
Other	35	19%
Unknown	56	31%

Attention Score in Context

This research output has an Altmetric Attention Score of 6. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 04 April 2017.

All research outputs

#5,954,816

of 24,558,777 outputs

Outputs from BMC Bioinformatics

#1,903

of 7,553 outputs

Outputs of similar age

#82,510

of 336,627 outputs

Outputs of similar age from BMC Bioinformatics

#34

of 126 outputs

Altmetric has tracked 24,558,777 research outputs across all sources so far. Compared to these this one has done well and is in the 75th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.

So far Altmetric has tracked 7,553 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one has gotten more attention than average, scoring higher than 73% of its peers.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 336,627 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 74% of its contemporaries.

We're also able to compare this research output to 126 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 73% of its contemporaries.

A robust data scaling algorithm to improve classification accuracies in biomedical data

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context