Title |
Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation
|
---|---|
Published in |
Journal of Cheminformatics, November 2014
|
DOI | 10.1186/s13321-014-0047-1 |
Pubmed ID | |
Authors |
Désirée Baumann, Knut Baumann |
Abstract |
Generally, QSAR modelling requires both model selection and validation since there is no a priori knowledge about the optimal QSAR model. Prediction errors (PE) are frequently used to select and to assess the models under study. Reliable estimation of prediction errors is challenging - especially under model uncertainty - and requires independent test objects. These test objects must not be involved in model building nor in model selection. Double cross-validation, sometimes also termed nested cross-validation, offers an attractive possibility to generate test data and to select QSAR models since it uses the data very efficiently. Nevertheless, there is a controversy in the literature with respect to the reliability of double cross-validation under model uncertainty. Moreover, systematic studies investigating the adequate parameterization of double cross-validation are still missing. Here, the cross-validation design in the inner loop and the influence of the test set size in the outer loop is systematically studied for regression models in combination with variable selection. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
Germany | 1 | 50% |
United Kingdom | 1 | 50% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 2 | 100% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United Kingdom | 1 | 1% |
Sweden | 1 | 1% |
Bulgaria | 1 | 1% |
Germany | 1 | 1% |
Unknown | 94 | 96% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 28 | 29% |
Researcher | 15 | 15% |
Student > Master | 14 | 14% |
Student > Bachelor | 8 | 8% |
Student > Postgraduate | 4 | 4% |
Other | 10 | 10% |
Unknown | 19 | 19% |
Readers by discipline | Count | As % |
---|---|---|
Chemistry | 37 | 38% |
Agricultural and Biological Sciences | 7 | 7% |
Computer Science | 5 | 5% |
Pharmacology, Toxicology and Pharmaceutical Science | 4 | 4% |
Medicine and Dentistry | 4 | 4% |
Other | 15 | 15% |
Unknown | 26 | 27% |