↓ Skip to main content

Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons

Overview of attention for article published in Journal of Cheminformatics, February 2013
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
63 Dimensions

Readers on

mendeley
81 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons
Published in
Journal of Cheminformatics, February 2013
DOI 10.1186/1758-2946-5-9
Pubmed ID
Authors

Ana L Teixeira, João P Leal, Andre O Falcao

Abstract

One of the main topics in the development of quantitative structure-property relationship (QSPR) predictive models is the identification of the subset of variables that represent the structure of a molecule and which are predictors for a given property. There are several automated feature selection methods, ranging from backward, forward or stepwise procedures, to further elaborated methodologies such as evolutionary programming. The problem lies in selecting the minimum subset of descriptors that can predict a certain property with a good performance, computationally efficient and in a more robust way, since the presence of irrelevant or redundant features can cause poor generalization capacity. In this paper an alternative selection method, based on Random Forests to determine the variable importance is proposed in the context of QSPR regression problems, with an application to a manually curated dataset for predicting standard enthalpy of formation. The subsequent predictive models are trained with support vector machines introducing the variables sequentially from a ranked list based on the variable importance.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 81 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Portugal 1 1%
Germany 1 1%
India 1 1%
Canada 1 1%
China 1 1%
United States 1 1%
Unknown 75 93%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 23 28%
Student > Master 17 21%
Researcher 9 11%
Other 5 6%
Student > Postgraduate 4 5%
Other 11 14%
Unknown 12 15%
Readers by discipline Count As %
Chemistry 21 26%
Agricultural and Biological Sciences 10 12%
Computer Science 8 10%
Engineering 6 7%
Pharmacology, Toxicology and Pharmaceutical Science 4 5%
Other 19 23%
Unknown 13 16%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 12 February 2013.
All research outputs
#18,329,207
of 22,696,971 outputs
Outputs from Journal of Cheminformatics
#794
of 828 outputs
Outputs of similar age
#222,745
of 287,600 outputs
Outputs of similar age from Journal of Cheminformatics
#18
of 19 outputs
Altmetric has tracked 22,696,971 research outputs across all sources so far. This one is in the 11th percentile – i.e., 11% of other outputs scored the same or lower than it.
So far Altmetric has tracked 828 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 11.0. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 287,600 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 11th percentile – i.e., 11% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 19 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.