Title |
Prediction using step-wise L1, L2 regularization and feature selection for small data sets with large number of features
|
---|---|
Published in |
BMC Bioinformatics, October 2011
|
DOI | 10.1186/1471-2105-12-412 |
Pubmed ID | |
Authors |
Ozgur Demir-Kavuk, Mayumi Kamada, Tatsuya Akutsu, Ernst-Walter Knapp |
Abstract |
Machine learning methods are nowadays used for many biological prediction problems involving drugs, ligands or polypeptide segments of a protein. In order to build a prediction model a so called training data set of molecules with measured target properties is needed. For many such problems the size of the training data set is limited as measurements have to be performed in a wet lab. Furthermore, the considered problems are often complex, such that it is not clear which molecular descriptors (features) may be suitable to establish a strong correlation with the target property. In many applications all available descriptors are used. This can lead to difficult machine learning problems, when thousands of descriptors are considered and only few (e.g. below hundred) molecules are available for training. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
Unknown | 1 | 100% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 1 | 100% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 2 | 2% |
Malaysia | 1 | <1% |
Colombia | 1 | <1% |
Spain | 1 | <1% |
Sweden | 1 | <1% |
Unknown | 99 | 94% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 17 | 16% |
Researcher | 15 | 14% |
Student > Master | 13 | 12% |
Student > Bachelor | 12 | 11% |
Professor | 4 | 4% |
Other | 16 | 15% |
Unknown | 28 | 27% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 20 | 19% |
Engineering | 12 | 11% |
Agricultural and Biological Sciences | 10 | 10% |
Biochemistry, Genetics and Molecular Biology | 5 | 5% |
Mathematics | 5 | 5% |
Other | 16 | 15% |
Unknown | 37 | 35% |