Report for: PubChem chemical structure standardization

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	PubChem chemical structure standardization
Published in	Journal of Cheminformatics, August 2018
DOI	10.1186/s13321-018-0293-8
Pubmed ID	30097821
Authors	Volker D. Hähnke, Sunghwan Kim, Evan E. Bolton
Abstract	PubChem is a chemical information repository, consisting of three primary databases: Substance, Compound, and BioAssay. When individual data contributors submit chemical substance descriptions to Substance, the unique chemical structures are extracted and stored into Compound through an automated process called structure standardization. The present study describes the PubChem standardization approaches and analyzes them for their success rates, reasons that cause structures to be rejected, and modifications applied to structures during the standardization process. Furthermore, the PubChem standardization is compared to the structure normalization of the IUPAC International Chemical Identifier (InChI) software, as manifested by conversion of the InChI back into a chemical structure. The observed rejection rate for substances processed by PubChem standardization was 0.36%, which is predominantly attributed to structures with invalid atom valences that cannot be readily corrected without additional information from contributors. Of all structures that pass standardization, 44% are modified in the process, reducing the count of unique structures from 53,574,724 in substance to 45,808,881 in compound as identified by de-aromatized canonical isomeric SMILES. Even though the processing time is very low on average (only 0.4% of structures have individual standardization time above 0.1 s), total standardization time is completely dominated by edge cases: 90% of the time to standardize all structures in PubChem substance is spent on the 2.05% of structures with the highest individual standardization time. It is worth noting that 60% of the structures obtained from PubChem structure standardization are not identical to the chemical structure resulting from the InChI (primarily due to preferences for a different tautomeric form). Standardization of chemical structures is complicated by the diversity of chemical information and their representations approaches. The PubChem standardization is an effective and efficient tool to account for molecular diversity and to eliminate invalid/incomplete structures. Further development will concentrate on improved tautomer consideration and an expanded stereocenter definition. Modifications are difficult to thoroughly validate, with slight changes often affecting many thousands of structures and various edge cases. The PubChem structure standardization service is accessible as a public resource ( https://pubchem.ncbi.nlm.nih.gov/standardize ), and via programmatic interfaces.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 30 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United States	5	17%
United Kingdom	4	13%
Sweden	3	10%
France	2	7%
Japan	1	3%
Brazil	1	3%
Israel	1	3%
Unknown	13	43%

Demographic breakdown

Type	Count	As %
Members of the public	13	43%
Scientists	13	43%
Science communicators (journalists, bloggers, editors)	4	13%

Mendeley readers

The data shown below were compiled from readership statistics for 185 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	185	100%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	33	18%
Student > Bachelor	19	10%
Student > Master	16	9%
Student > Ph. D. Student	12	6%
Other	9	5%
Other	23	12%
Unknown	73	39%

Readers by discipline	Count	As %
Chemistry	44	24%
Pharmacology, Toxicology and Pharmaceutical Science	16	9%
Biochemistry, Genetics and Molecular Biology	12	6%
Agricultural and Biological Sciences	9	5%
Unspecified	5	3%
Other	17	9%
Unknown	82	44%

Attention Score in Context

This research output has an Altmetric Attention Score of 31. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 01 September 2023.

All research outputs

#1,286,505

of 25,713,737 outputs

Outputs from Journal of Cheminformatics

#59

of 981 outputs

Outputs of similar age

#26,440

of 342,290 outputs

Outputs of similar age from Journal of Cheminformatics

of 19 outputs

Altmetric has tracked 25,713,737 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 94th percentile: it's in the top 10% of all research outputs ever tracked by Altmetric.

So far Altmetric has tracked 981 research outputs from this source. They typically receive more attention than average, with a mean Attention Score of 10.0. This one has done particularly well, scoring higher than 93% of its peers.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 342,290 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 92% of its contemporaries.

We're also able to compare this research output to 19 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 94% of its contemporaries.

PubChem chemical structure standardization

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context