↓ Skip to main content

The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS

Overview of attention for article published in Journal of Cheminformatics, January 2016
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (92nd percentile)
  • High Attention Score compared to outputs of the same age and source (87th percentile)

Mentioned by

blogs
2 blogs
twitter
8 X users
facebook
1 Facebook page
wikipedia
4 Wikipedia pages

Citations

dimensions_citation
60 Dimensions

Readers on

mendeley
78 Mendeley
citeulike
1 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS
Published in
Journal of Cheminformatics, January 2016
DOI 10.1186/s13321-016-0113-y
Pubmed ID
Authors

Igor V. Tetko, Daniel M. Lowe, Antony J. Williams

Abstract

Melting point (MP) is an important property in regards to the solubility of chemical compounds. Its prediction from chemical structure remains a highly challenging task for quantitative structure-activity relationship studies. Success in this area of research critically depends on the availability of high quality MP data as well as accurate chemical structure representations in order to develop models. Currently, available datasets for MP predictions have been limited to around 50k molecules while lots more data are routinely generated following the synthesis of novel materials. Significant amounts of MP data are freely available within the patent literature and, if it were available in the appropriate form, could potentially be used to develop predictive models. We have developed a pipeline for the automated extraction and annotation of chemical data from published PATENTS. Almost 300,000 data points have been collected and used to develop models to predict melting and pyrolysis (decomposition) points using tools available on the OCHEM modeling platform (http://ochem.eu). A number of technical challenges were simultaneously solved to develop models based on these data. These included the handing of sparse data matrices with >200,000,000,000 entries and parallel calculations using 32 × 6 cores per task using 13 descriptor sets totaling more than 700,000 descriptors. We showed that models developed using data collected from PATENTS had similar or better prediction accuracy compared to the highly curated data used in previous publications. The separation of data for chemicals that decomposed rather than melting, from compounds that did undergo a normal melting transition, was performed and models for both pyrolysis and MPs were developed. The accuracy of the consensus MP models for molecules from the drug-like region of chemical space was similar to their estimated experimental accuracy, 32 °C. Last but not least, important structural features related to the pyrolysis of chemicals were identified, and a model to predict whether a compound will decompose instead of melting was developed. We have shown that automated tools for the analysis of chemical information have reached a mature stage allowing for the extraction and collection of high quality data to enable the development of structure-activity relationship models. The developed models and data are publicly available at http://ochem.eu/article/99826.

X Demographics

X Demographics

The data shown below were collected from the profiles of 8 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 78 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 1 1%
Bulgaria 1 1%
Germany 1 1%
Unknown 75 96%

Demographic breakdown

Readers by professional status Count As %
Researcher 22 28%
Student > Ph. D. Student 18 23%
Student > Bachelor 9 12%
Other 4 5%
Student > Doctoral Student 3 4%
Other 12 15%
Unknown 10 13%
Readers by discipline Count As %
Chemistry 22 28%
Computer Science 10 13%
Pharmacology, Toxicology and Pharmaceutical Science 8 10%
Engineering 5 6%
Medicine and Dentistry 4 5%
Other 14 18%
Unknown 15 19%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 21. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 27 November 2021.
All research outputs
#1,680,387
of 24,143,470 outputs
Outputs from Journal of Cheminformatics
#136
of 891 outputs
Outputs of similar age
#30,391
of 403,573 outputs
Outputs of similar age from Journal of Cheminformatics
#3
of 16 outputs
Altmetric has tracked 24,143,470 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 93rd percentile: it's in the top 10% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 891 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 10.7. This one has done well, scoring higher than 84% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 403,573 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 92% of its contemporaries.
We're also able to compare this research output to 16 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 87% of its contemporaries.