↓ Skip to main content

Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents

Overview of attention for article published in Journal of Cheminformatics, October 2015
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (90th percentile)
  • High Attention Score compared to outputs of the same age and source (85th percentile)

Mentioned by

blogs
1 blog
twitter
14 X users

Citations

dimensions_citation
23 Dimensions

Readers on

mendeley
50 Mendeley
citeulike
3 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents
Published in
Journal of Cheminformatics, October 2015
DOI 10.1186/s13321-015-0097-z
Pubmed ID
Authors

Stefan Senger, Luca Bartek, George Papadatos, Anna Gaulton

Abstract

First public disclosure of new chemical entities often takes place in patents, which makes them an important source of information. However, with an ever increasing number of patent applications, manual processing and curation on such a large scale becomes even more challenging. An alternative approach better suited for this large corpus of documents is the automated extraction of chemical structures. A number of patent chemistry databases generated by using the latter approach are now available but little is known that can help to manage expectations when using them. This study aims to address this by comparing two such freely available sources, SureChEMBL and IBM SIIP (IBM Strategic Intellectual Property Insight Platform), with manually curated commercial databases. When looking at the percentage of chemical structures successfully extracted from a set of patents, using SciFinder as our reference, 59 and 51 % were also found in our comparison in SureChEMBL and IBM SIIP, respectively. When performing this comparison with compounds as starting point, i.e. establishing if for a list of compounds the databases provide the links between chemical structures and patents they appear in, we obtained similar results. SureChEMBL and IBM SIIP found 62 and 59 %, respectively, of the compound-patent pairs obtained from Reaxys. In our comparison of automatically generated vs. manually curated patent chemistry databases, the former successfully provided approximately 60 % of links between chemical structure and patents. It needs to be stressed that only a very limited number of patents and compound-patent pairs were used for our comparison. Nevertheless, our results will hopefully help to manage expectations of users of patent chemistry databases of this type and provide a useful framework for more studies like ours as well as guide future developments of the workflows used for the automated extraction of chemical structures from patents. The challenges we have encountered whilst performing this study highlight that more needs to be done to make such assessments easier. Above all, more adequate, preferably open access to relevant 'gold standards' is required.

X Demographics

X Demographics

The data shown below were collected from the profiles of 14 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 50 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Germany 2 4%
Netherlands 1 2%
Brazil 1 2%
India 1 2%
United Kingdom 1 2%
United States 1 2%
Unknown 43 86%

Demographic breakdown

Readers by professional status Count As %
Researcher 15 30%
Student > Ph. D. Student 8 16%
Student > Bachelor 6 12%
Student > Master 5 10%
Professor > Associate Professor 3 6%
Other 3 6%
Unknown 10 20%
Readers by discipline Count As %
Chemistry 12 24%
Computer Science 9 18%
Agricultural and Biological Sciences 5 10%
Pharmacology, Toxicology and Pharmaceutical Science 3 6%
Engineering 3 6%
Other 6 12%
Unknown 12 24%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 19. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 04 July 2016.
All research outputs
#1,769,518
of 23,577,761 outputs
Outputs from Journal of Cheminformatics
#155
of 874 outputs
Outputs of similar age
#26,211
of 279,458 outputs
Outputs of similar age from Journal of Cheminformatics
#2
of 14 outputs
Altmetric has tracked 23,577,761 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 92nd percentile: it's in the top 10% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 874 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 11.0. This one has done well, scoring higher than 82% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 279,458 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 90% of its contemporaries.
We're also able to compare this research output to 14 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 85% of its contemporaries.