ā†“ Skip to main content

Kernel based methods for accelerated failure time model with ultra-high dimensional data

Overview of attention for article published in BMC Bioinformatics, December 2010
Altmetric Badge

Readers on

mendeley
34 Mendeley
citeulike
2 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Kernel based methods for accelerated failure time model with ultra-high dimensional data
Published in
BMC Bioinformatics, December 2010
DOI 10.1186/1471-2105-11-606
Pubmed ID
Authors

Zhenqiu Liu, Dechang Chen, Ming Tan, Feng Jiang, Ronald B Gartenhaus

Abstract

Most genomic data have ultra-high dimensions with more than 10,000 genes (probes). Regularization methods with Lā‚ and L(p) penalty have been extensively studied in survival analysis with high-dimensional genomic data. However, when the sample size n < m (the number of genes), directly identifying a small subset of genes from ultra-high (m > 10, 000) dimensional data is time-consuming and not computationally efficient. In current microarray analysis, what people really do is select a couple of thousands (or hundreds) of genes using univariate analysis or statistical tests, and then apply the LASSO-type penalty to further reduce the number of disease associated genes. This two-step procedure may introduce bias and inaccuracy and lead us to miss biologically important genes. The accelerated failure time (AFT) model is a linear regression model and a useful alternative to the Cox model for survival analysis. In this paper, we propose a nonlinear kernel based AFT model and an efficient variable selection method with adaptive kernel ridge regression. Our proposed variable selection method is based on the kernel matrix and dual problem with a much smaller n x n matrix. It is very efficient when the number of unknown variables (genes) is much larger than the number of samples. Moreover, the primal variables are explicitly updated and the sparsity in the solution is exploited. Our proposed methods can simultaneously identify survival associated prognostic factors and predict survival outcomes with ultra-high dimensional genomic data. We have demonstrated the performance of our methods with both simulation and real data. The proposed method performs superbly with limited computational studies.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 34 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Ecuador 1 3%
Germany 1 3%
Italy 1 3%
France 1 3%
Unknown 30 88%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 12 35%
Researcher 9 26%
Professor > Associate Professor 3 9%
Student > Master 3 9%
Professor 2 6%
Other 3 9%
Unknown 2 6%
Readers by discipline Count As %
Computer Science 13 38%
Mathematics 4 12%
Agricultural and Biological Sciences 4 12%
Engineering 4 12%
Medicine and Dentistry 3 9%
Other 2 6%
Unknown 4 12%