↓ Skip to main content

Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets

Overview of attention for article published in Journal of Cheminformatics, November 2016
Altmetric Badge

Mentioned by

googleplus
1 Google+ user

Citations

dimensions_citation
14 Dimensions

Readers on

mendeley
38 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets
Published in
Journal of Cheminformatics, November 2016
DOI 10.1186/s13321-016-0163-1
Pubmed ID
Authors

Sunghwan Kim, Evan E. Bolton, Stephen H. Bryant

Abstract

PubChem is a public repository for biological activities of small molecules. For the efficient use of its vast amount of chemical information, PubChem performs 2-dimensional (2-D) and 3-dimensional (3-D) neighborings, which precompute "neighbor" relationships between molecules in the PubChem Compound database, using the PubChem subgraph fingerprints-based 2-D similarity and the Gaussian-shape overlay-based 3-D similarity, respectively. These neighborings allow PubChem to provide the user with immediate access to the list of 2-D and 3-D neighbors (also called "Similar Compounds" and "Similar Conformers", respectively) for each compound in PubChem. However, because 3-D neighboring is much more time-consuming than 2-D neighboring, how different the results of the two neighboring schemes are is an important question, considering limited computational resources. The present study analyzed the complementarity between the PubChem 2-D and 3-D neighbors. When all compounds in PubChem were considered, the overlap between 2-D and 3-D neighbors was only 2% of the total neighbors. For the data sets containing compounds with annotated information, the overlap increased as the data sets became smaller. However, it did not exceed 31% and substantial fractions of neighbors were still recognized by either PubChem 2-D or 3-D similarity, but not by both. The Neighbor Preference Index (NPI) of a molecule for a given data set was introduced, which quantified whether a molecule had more 2-D or 3-D neighbors in the data set. The NPI histogram for all PubChem compounds had a bimodal shape with two maxima at NPI = ±1 and a minimum at NPI = 0. However, the NPI histograms for the subsets containing compounds with annotated information had a greater fraction of compounds with a strong preference for one neighboring method to the other (at NPI = ±1) as well as compounds with a neutral preference (at NPI = 0). The results of our study indicate that, for the majority of the compounds in PubChem, their structural similarity to other compounds can be recognized predominantly by either 2-D or 3-D neighborings, but not by both, showing a strong complementarity between 2-D and 3-D neighboring results. Therefore, despite its heavy requirements for computational resources, 3-D neighboring provides an alternative way in which the user can instantly access structurally similar molecules that cannot be detected if only 2-D neighboring is used.Graphical AbstractThe binned distribution of the neighbor preference indices (NPIs) for all compounds in PubChem (left) has a bimodal shape with two maxima at NPI = ±1 and a minimum at NPI = 0, indicating that structural similarity between compounds in PubChem can be recognized predominantly by either 2-D or 3-D neighborings, but not by both. The NPI histogram for the drug space (right) has a greater fraction of compounds with a strong preference for one neighboring method to the other (at NPI ≈ ±1) as well as compounds with a neutral preference (at NPI ≈ 0), indicating that the drug space is very different from the PubChem space.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 38 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 38 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 9 24%
Student > Bachelor 4 11%
Student > Ph. D. Student 3 8%
Student > Master 3 8%
Student > Doctoral Student 2 5%
Other 6 16%
Unknown 11 29%
Readers by discipline Count As %
Chemistry 9 24%
Biochemistry, Genetics and Molecular Biology 5 13%
Pharmacology, Toxicology and Pharmaceutical Science 4 11%
Agricultural and Biological Sciences 1 3%
Social Sciences 1 3%
Other 3 8%
Unknown 15 39%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 30 April 2017.
All research outputs
#15,863,447
of 23,567,572 outputs
Outputs from Journal of Cheminformatics
#792
of 872 outputs
Outputs of similar age
#197,682
of 313,153 outputs
Outputs of similar age from Journal of Cheminformatics
#22
of 24 outputs
Altmetric has tracked 23,567,572 research outputs across all sources so far. This one is in the 22nd percentile – i.e., 22% of other outputs scored the same or lower than it.
So far Altmetric has tracked 872 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 11.0. This one is in the 4th percentile – i.e., 4% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 313,153 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 28th percentile – i.e., 28% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 24 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.