Title |
Minimum redundancy maximum relevance feature selection approach for temporal gene expression data
|
---|---|
Published in |
BMC Bioinformatics, January 2017
|
DOI | 10.1186/s12859-016-1423-9 |
Pubmed ID | |
Authors |
Milos Radovic, Mohamed Ghalwash, Nenad Filipovic, Zoran Obradovic |
Abstract |
Feature selection, aiming to identify a subset of features among a possibly large set of features that are relevant for predicting a response, is an important preprocessing step in machine learning. In gene expression studies this is not a trivial task for several reasons, including potential temporal character of data. However, most feature selection approaches developed for microarray data cannot handle multivariate temporal data without previous data flattening, which results in loss of temporal information. We propose a temporal minimum redundancy - maximum relevance (TMRMR) feature selection approach, which is able to handle multivariate temporal data without previous data flattening. In the proposed approach we compute relevance of a gene by averaging F-statistic values calculated across individual time steps, and we compute redundancy between genes by using a dynamical time warping approach. The proposed method is evaluated on three temporal gene expression datasets from human viral challenge studies. Obtained results show that the proposed method outperforms alternatives widely used in gene expression studies. In particular, the proposed method achieved improvement in accuracy in 34 out of 54 experiments, while the other methods outperformed it in no more than 4 experiments. We developed a filter-based feature selection method for temporal gene expression data based on maximum relevance and minimum redundancy criteria. The proposed method incorporates temporal information by combining relevance, which is calculated as an average F-statistic value across different time steps, with redundancy, which is calculated by employing dynamical time warping approach. As evident in our experiments, incorporating the temporal information into the feature selection process leads to selection of more discriminative features. |
Twitter Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 2 | 100% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 2 | 100% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
China | 1 | <1% |
Unknown | 256 | 100% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 47 | 18% |
Student > Master | 36 | 14% |
Researcher | 32 | 12% |
Student > Bachelor | 27 | 11% |
Lecturer | 13 | 5% |
Other | 45 | 18% |
Unknown | 57 | 22% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 63 | 25% |
Engineering | 47 | 18% |
Biochemistry, Genetics and Molecular Biology | 18 | 7% |
Medicine and Dentistry | 16 | 6% |
Agricultural and Biological Sciences | 11 | 4% |
Other | 32 | 12% |
Unknown | 70 | 27% |