Title |
Similar compounds versus similar conformers: complementarity between PubChem 2-D and 3-D neighboring sets
|
---|---|
Published in |
Journal of Cheminformatics, November 2016
|
DOI | 10.1186/s13321-016-0163-1 |
Pubmed ID | |
Authors |
Sunghwan Kim, Evan E. Bolton, Stephen H. Bryant |
Abstract |
PubChem is a public repository for biological activities of small molecules. For the efficient use of its vast amount of chemical information, PubChem performs 2-dimensional (2-D) and 3-dimensional (3-D) neighborings, which precompute "neighbor" relationships between molecules in the PubChem Compound database, using the PubChem subgraph fingerprints-based 2-D similarity and the Gaussian-shape overlay-based 3-D similarity, respectively. These neighborings allow PubChem to provide the user with immediate access to the list of 2-D and 3-D neighbors (also called "Similar Compounds" and "Similar Conformers", respectively) for each compound in PubChem. However, because 3-D neighboring is much more time-consuming than 2-D neighboring, how different the results of the two neighboring schemes are is an important question, considering limited computational resources. The present study analyzed the complementarity between the PubChem 2-D and 3-D neighbors. When all compounds in PubChem were considered, the overlap between 2-D and 3-D neighbors was only 2% of the total neighbors. For the data sets containing compounds with annotated information, the overlap increased as the data sets became smaller. However, it did not exceed 31% and substantial fractions of neighbors were still recognized by either PubChem 2-D or 3-D similarity, but not by both. The Neighbor Preference Index (NPI) of a molecule for a given data set was introduced, which quantified whether a molecule had more 2-D or 3-D neighbors in the data set. The NPI histogram for all PubChem compounds had a bimodal shape with two maxima at NPI = ±1 and a minimum at NPI = 0. However, the NPI histograms for the subsets containing compounds with annotated information had a greater fraction of compounds with a strong preference for one neighboring method to the other (at NPI = ±1) as well as compounds with a neutral preference (at NPI = 0). The results of our study indicate that, for the majority of the compounds in PubChem, their structural similarity to other compounds can be recognized predominantly by either 2-D or 3-D neighborings, but not by both, showing a strong complementarity between 2-D and 3-D neighboring results. Therefore, despite its heavy requirements for computational resources, 3-D neighboring provides an alternative way in which the user can instantly access structurally similar molecules that cannot be detected if only 2-D neighboring is used.Graphical AbstractThe binned distribution of the neighbor preference indices (NPIs) for all compounds in PubChem (left) has a bimodal shape with two maxima at NPI = ±1 and a minimum at NPI = 0, indicating that structural similarity between compounds in PubChem can be recognized predominantly by either 2-D or 3-D neighborings, but not by both. The NPI histogram for the drug space (right) has a greater fraction of compounds with a strong preference for one neighboring method to the other (at NPI ≈ ±1) as well as compounds with a neutral preference (at NPI ≈ 0), indicating that the drug space is very different from the PubChem space. |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Unknown | 38 | 100% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 9 | 24% |
Student > Bachelor | 4 | 11% |
Student > Ph. D. Student | 3 | 8% |
Student > Master | 3 | 8% |
Student > Doctoral Student | 2 | 5% |
Other | 6 | 16% |
Unknown | 11 | 29% |
Readers by discipline | Count | As % |
---|---|---|
Chemistry | 9 | 24% |
Biochemistry, Genetics and Molecular Biology | 5 | 13% |
Pharmacology, Toxicology and Pharmaceutical Science | 4 | 11% |
Agricultural and Biological Sciences | 1 | 3% |
Social Sciences | 1 | 3% |
Other | 3 | 8% |
Unknown | 15 | 39% |