Report for: Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets
Published in	Journal of Cheminformatics, November 2016
DOI	10.1186/s13321-016-0163-1
Pubmed ID	27872662
Authors	Sunghwan Kim, Evan E. Bolton, Stephen H. Bryant
Abstract	PubChem is a public repository for biological activities of small molecules. For the efficient use of its vast amount of chemical information, PubChem performs 2-dimensional (2-D) and 3-dimensional (3-D) neighborings, which precompute "neighbor" relationships between molecules in the PubChem Compound database, using the PubChem subgraph fingerprints-based 2-D similarity and the Gaussian-shape overlay-based 3-D similarity, respectively. These neighborings allow PubChem to provide the user with immediate access to the list of 2-D and 3-D neighbors (also called "Similar Compounds" and "Similar Conformers", respectively) for each compound in PubChem. However, because 3-D neighboring is much more time-consuming than 2-D neighboring, how different the results of the two neighboring schemes are is an important question, considering limited computational resources. The present study analyzed the complementarity between the PubChem 2-D and 3-D neighbors. When all compounds in PubChem were considered, the overlap between 2-D and 3-D neighbors was only 2% of the total neighbors. For the data sets containing compounds with annotated information, the overlap increased as the data sets became smaller. However, it did not exceed 31% and substantial fractions of neighbors were still recognized by either PubChem 2-D or 3-D similarity, but not by both. The Neighbor Preference Index (NPI) of a molecule for a given data set was introduced, which quantified whether a molecule had more 2-D or 3-D neighbors in the data set. The NPI histogram for all PubChem compounds had a bimodal shape with two maxima at NPI = ±1 and a minimum at NPI = 0. However, the NPI histograms for the subsets containing compounds with annotated information had a greater fraction of compounds with a strong preference for one neighboring method to the other (at NPI = ±1) as well as compounds with a neutral preference (at NPI = 0). The results of our study indicate that, for the majority of the compounds in PubChem, their structural similarity to other compounds can be recognized predominantly by either 2-D or 3-D neighborings, but not by both, showing a strong complementarity between 2-D and 3-D neighboring results. Therefore, despite its heavy requirements for computational resources, 3-D neighboring provides an alternative way in which the user can instantly access structurally similar molecules that cannot be detected if only 2-D neighboring is used.Graphical AbstractThe binned distribution of the neighbor preference indices (NPIs) for all compounds in PubChem (left) has a bimodal shape with two maxima at NPI = ±1 and a minimum at NPI = 0, indicating that structural similarity between compounds in PubChem can be recognized predominantly by either 2-D or 3-D neighborings, but not by both. The NPI histogram for the drug space (right) has a greater fraction of compounds with a strong preference for one neighboring method to the other (at NPI ≈ ±1) as well as compounds with a neutral preference (at NPI ≈ 0), indicating that the drug space is very different from the PubChem space.

View on publisher site Alert me about new mentions

Mendeley readers

The data shown below were compiled from readership statistics for 38 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	38	100%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	9	24%
Student > Bachelor	4	11%
Student > Ph. D. Student	3	8%
Student > Master	3	8%
Student > Doctoral Student	2	5%
Other	6	16%
Unknown	11	29%

Readers by discipline	Count	As %
Chemistry	9	24%
Biochemistry, Genetics and Molecular Biology	5	13%
Pharmacology, Toxicology and Pharmaceutical Science	4	11%
Agricultural and Biological Sciences	1	3%
Social Sciences	1	3%
Other	3	8%
Unknown	15	39%

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 30 April 2017.

All research outputs

#15,863,447

of 23,567,572 outputs

Outputs from Journal of Cheminformatics

#792

of 872 outputs

Outputs of similar age

#197,682

of 313,153 outputs

Outputs of similar age from Journal of Cheminformatics

#22

of 24 outputs

Altmetric has tracked 23,567,572 research outputs across all sources so far. This one is in the 22nd percentile – i.e., 22% of other outputs scored the same or lower than it.

So far Altmetric has tracked 872 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 11.0. This one is in the 4th percentile – i.e., 4% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 313,153 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 28th percentile – i.e., 28% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 24 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.

Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets

Mentioned by

Citations

Readers on

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context