Altmetric – Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons
Published in	Journal of Cheminformatics, February 2013
DOI	10.1186/1758-2946-5-9
Pubmed ID	23399299
Authors	Ana L Teixeira, João P Leal, Andre O Falcao
Abstract	One of the main topics in the development of quantitative structure-property relationship (QSPR) predictive models is the identification of the subset of variables that represent the structure of a molecule and which are predictors for a given property. There are several automated feature selection methods, ranging from backward, forward or stepwise procedures, to further elaborated methodologies such as evolutionary programming. The problem lies in selecting the minimum subset of descriptors that can predict a certain property with a good performance, computationally efficient and in a more robust way, since the presence of irrelevant or redundant features can cause poor generalization capacity. In this paper an alternative selection method, based on Random Forests to determine the variable importance is proposed in the context of QSPR regression problems, with an application to a manually curated dataset for predicting standard enthalpy of formation. The subsequent predictive models are trained with support vector machines introducing the variables sequentially from a ranked list based on the variable importance.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
Germany	1	100%

Demographic breakdown

Type	Count	As %
Scientists	1	100%

Mendeley readers

The data shown below were compiled from readership statistics for 81 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Portugal	1	1%
Germany	1	1%
India	1	1%
Canada	1	1%
China	1	1%
United States	1	1%
Unknown	75	93%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	23	28%
Student > Master	17	21%
Researcher	9	11%
Other	5	6%
Student > Postgraduate	4	5%
Other	11	14%
Unknown	12	15%

Readers by discipline	Count	As %
Chemistry	21	26%
Agricultural and Biological Sciences	10	12%
Computer Science	8	10%
Engineering	6	7%
Pharmacology, Toxicology and Pharmaceutical Science	4	5%
Other	19	23%
Unknown	13	16%

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 12 February 2013.

All research outputs

#18,329,207

of 22,696,971 outputs

Outputs from Journal of Cheminformatics

#794

of 828 outputs

Outputs of similar age

#222,745

of 287,600 outputs

Outputs of similar age from Journal of Cheminformatics

#18

of 19 outputs

Altmetric has tracked 22,696,971 research outputs across all sources so far. This one is in the 11th percentile – i.e., 11% of other outputs scored the same or lower than it.

So far Altmetric has tracked 828 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 11.0. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 287,600 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 11th percentile – i.e., 11% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 19 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.

Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context