↓ Skip to main content

Impact of similarity threshold on the topology of molecular similarity networks and clustering outcomes

Overview of attention for article published in Journal of Cheminformatics, March 2016
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • Good Attention Score compared to outputs of the same age (73rd percentile)
  • Good Attention Score compared to outputs of the same age and source (66th percentile)

Mentioned by

twitter
3 X users
patent
1 patent

Readers on

mendeley
58 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Impact of similarity threshold on the topology of molecular similarity networks and clustering outcomes
Published in
Journal of Cheminformatics, March 2016
DOI 10.1186/s13321-016-0127-5
Pubmed ID
Authors

Gergely Zahoránszky-Kőhalmi, Cristian G. Bologa, Tudor I. Oprea

Abstract

Complex network theory based methods and the emergence of "Big Data" have reshaped the terrain of investigating structure-activity relationships of molecules. This change gave rise to new methods which need to face an important challenge, namely: how to restructure a large molecular dataset into a network that best serves the purpose of the subsequent analyses. With special focus on network clustering, our study addresses this open question by proposing a data transformation method and a clustering framework. Using the WOMBAT and PubChem MLSMR datasets we investigated the relation between varying the similarity threshold applied on the similarity matrix and the average clustering coefficient of the emerging similarity-based networks. These similarity networks were then clustered with the InfoMap algorithm. We devised a systematic method to generate so-called "pseudo-reference" clustering datasets which compensate for the lack of large-scale reference datasets. With help from the clustering framework we were able to observe the effects of varying the similarity threshold and its consequence on the average clustering coefficient and the clustering performance. We observed that the average clustering coefficient versus similarity threshold function can be characterized by the presence of a peak that covers a range of similarity threshold values. This peak is preceded by a steep decline in the number of edges of the similarity network. The maximum of this peak is well aligned with the best clustering outcome. Thus, if no reference set is available, choosing the similarity threshold associated with this peak would be a near-ideal setting for the subsequent network cluster analysis. The proposed method can be used as a general approach to determine the appropriate similarity threshold to generate the similarity network of large-scale molecular datasets.

X Demographics

X Demographics

The data shown below were collected from the profiles of 3 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 58 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
India 1 2%
Bulgaria 1 2%
Brazil 1 2%
Unknown 55 95%

Demographic breakdown

Readers by professional status Count As %
Student > Master 13 22%
Researcher 10 17%
Student > Ph. D. Student 10 17%
Student > Doctoral Student 5 9%
Lecturer 3 5%
Other 9 16%
Unknown 8 14%
Readers by discipline Count As %
Chemistry 14 24%
Computer Science 11 19%
Agricultural and Biological Sciences 6 10%
Biochemistry, Genetics and Molecular Biology 4 7%
Pharmacology, Toxicology and Pharmaceutical Science 2 3%
Other 10 17%
Unknown 11 19%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 6. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 15 June 2021.
All research outputs
#5,552,748
of 22,858,915 outputs
Outputs from Journal of Cheminformatics
#463
of 836 outputs
Outputs of similar age
#79,010
of 300,631 outputs
Outputs of similar age from Journal of Cheminformatics
#5
of 15 outputs
Altmetric has tracked 22,858,915 research outputs across all sources so far. Compared to these this one has done well and is in the 75th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 836 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 11.0. This one is in the 44th percentile – i.e., 44% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 300,631 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 73% of its contemporaries.
We're also able to compare this research output to 15 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 66% of its contemporaries.