↓ Skip to main content

Comparison of subset selection methods in linear regression in the context of health-related quality of life and substance abuse in Russia

Overview of attention for article published in BMC Medical Research Methodology, August 2015
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
57 Dimensions

Readers on

mendeley
116 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Comparison of subset selection methods in linear regression in the context of health-related quality of life and substance abuse in Russia
Published in
BMC Medical Research Methodology, August 2015
DOI 10.1186/s12874-015-0066-2
Pubmed ID
Authors

Olga Morozova, Olga Levina, Anneli Uusküla, Robert Heimer

Abstract

Automatic stepwise subset selection methods in linear regression often perform poorly, both in terms of variable selection and estimation of coefficients and standard errors, especially when number of independent variables is large and multicollinearity is present. Yet, stepwise algorithms remain the dominant method in medical and epidemiological research. Performance of stepwise (backward elimination and forward selection algorithms using AIC, BIC, and Likelihood Ratio Test, p = 0.05 (LRT)) and alternative subset selection methods in linear regression, including Bayesian model averaging (BMA) and penalized regression (lasso, adaptive lasso, and adaptive elastic net) was investigated in a dataset from a cross-sectional study of drug users in St. Petersburg, Russia in 2012-2013. Dependent variable measured health-related quality of life, and independent correlates included 44 variables measuring demographics, behavioral, and structural factors. In our case study all methods returned models of different size and composition varying from 41 to 11 variables. The percentage of significant variables among those selected in final model varied from 100 % to 27 %. Model selection with stepwise methods was highly unstable, with most (and all in case of backward elimination: BIC, forward selection: BIC, and backward elimination: LRT) of the selected variables being significant (95 % confidence interval for coefficient did not include zero). Adaptive elastic net demonstrated improved stability and more conservative estimates of coefficients and standard errors compared to stepwise. By incorporating model uncertainty into subset selection and estimation of coefficients and their standard deviations, BMA returned a parsimonious model with the most conservative results in terms of covariates significance. BMA and adaptive elastic net performed best in our analysis. Based on our results and previous theoretical studies the use of stepwise methods in medical and epidemiological research may be outperformed by alternative methods in cases such as ours. In situations of high uncertainty it is beneficial to apply different methodologically sound subset selection methods, and explore where their outputs do and do not agree. We recommend that researchers, at a minimum, should explore model uncertainty and stability as part of their analyses, and report these details in epidemiological papers.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 116 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Malaysia 1 <1%
United Kingdom 1 <1%
Sudan 1 <1%
Unknown 113 97%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 21 18%
Researcher 13 11%
Student > Master 11 9%
Student > Doctoral Student 10 9%
Student > Bachelor 9 8%
Other 23 20%
Unknown 29 25%
Readers by discipline Count As %
Medicine and Dentistry 16 14%
Mathematics 8 7%
Social Sciences 8 7%
Computer Science 8 7%
Engineering 7 6%
Other 31 27%
Unknown 38 33%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 09 September 2015.
All research outputs
#20,493,050
of 25,183,822 outputs
Outputs from BMC Medical Research Methodology
#1,934
of 2,247 outputs
Outputs of similar age
#199,618
of 272,360 outputs
Outputs of similar age from BMC Medical Research Methodology
#25
of 27 outputs
Altmetric has tracked 25,183,822 research outputs across all sources so far. This one is in the 10th percentile – i.e., 10% of other outputs scored the same or lower than it.
So far Altmetric has tracked 2,247 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 10.4. This one is in the 6th percentile – i.e., 6% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 272,360 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 14th percentile – i.e., 14% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 27 others from the same source and published within six weeks on either side of this one. This one is in the 3rd percentile – i.e., 3% of its contemporaries scored the same or lower than it.