↓ Skip to main content

Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?

Overview of attention for article published in Journal of Cheminformatics, May 2015
Altmetric Badge

About this Attention Score

  • In the top 5% of all research outputs scored by Altmetric
  • Among the highest-scoring outputs from this source (#28 of 935)
  • High Attention Score compared to outputs of the same age (95th percentile)
  • High Attention Score compared to outputs of the same age and source (94th percentile)

Citations

dimensions_citation
868 Dimensions

Readers on

mendeley
1072 Mendeley
citeulike
1 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?
Published in
Journal of Cheminformatics, May 2015
DOI 10.1186/s13321-015-0069-3
Pubmed ID
Authors

Dávid Bajusz, Anita Rácz, Károly Héberger

Abstract

Cheminformaticians are equipped with a very rich toolbox when carrying out molecular similarity calculations. A large number of molecular representations exist, and there are several methods (similarity and distance metrics) to quantify the similarity of molecular representations. In this work, eight well-known similarity/distance metrics are compared on a large dataset of molecular fingerprints with sum of ranking differences (SRD) and ANOVA analysis. The effects of molecular size, selection methods and data pretreatment methods on the outcome of the comparison are also assessed. A supplier database (https://mcule.com/) was used as the source of compounds for the similarity calculations in this study. A large number of datasets, each consisting of one hundred compounds, were compiled, molecular fingerprints were generated and similarity values between a randomly chosen reference compound and the rest were calculated for each dataset. Similarity metrics were compared based on their ranking of the compounds within one experiment (one dataset) using sum of ranking differences (SRD), while the results of the entire set of experiments were summarized on box and whisker plots. Finally, the effects of various factors (data pretreatment, molecule size, selection method) were evaluated with analysis of variance (ANOVA). This study complements previous efforts to examine and rank various metrics for molecular similarity calculations. Here, however, an entirely general approach was taken to neglect any a priori knowledge on the compounds involved, as well as any bias introduced by examining only one or a few specific scenarios. The Tanimoto index, Dice index, Cosine coefficient and Soergel distance were identified to be the best (and in some sense equivalent) metrics for similarity calculations, i.e. these metrics could produce the rankings closest to the composite (average) ranking of the eight metrics. The similarity metrics derived from Euclidean and Manhattan distances are not recommended on their own, although their variability and diversity from other similarity metrics might be advantageous in certain cases (e.g. for data fusion). Conclusions are also drawn regarding the effects of molecule size, selection method and data pretreatment on the ranking behavior of the studied metrics. Graphical AbstractA visual summary of the comparison of similarity metrics with sum of ranking differences (SRD).

X Demographics

X Demographics

The data shown below were collected from the profiles of 3 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 1,072 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Germany 8 <1%
Brazil 6 <1%
United Kingdom 4 <1%
Spain 3 <1%
United States 3 <1%
South Africa 1 <1%
Indonesia 1 <1%
Denmark 1 <1%
Portugal 1 <1%
Other 2 <1%
Unknown 1042 97%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 254 24%
Researcher 147 14%
Student > Master 141 13%
Student > Bachelor 116 11%
Student > Doctoral Student 40 4%
Other 140 13%
Unknown 234 22%
Readers by discipline Count As %
Chemistry 220 21%
Biochemistry, Genetics and Molecular Biology 143 13%
Agricultural and Biological Sciences 102 10%
Computer Science 82 8%
Pharmacology, Toxicology and Pharmaceutical Science 67 6%
Other 178 17%
Unknown 280 26%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 42. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 24 November 2023.
All research outputs
#949,110
of 24,932,434 outputs
Outputs from Journal of Cheminformatics
#28
of 935 outputs
Outputs of similar age
#11,483
of 271,807 outputs
Outputs of similar age from Journal of Cheminformatics
#2
of 19 outputs
Altmetric has tracked 24,932,434 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 96th percentile: it's in the top 5% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 935 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 10.3. This one has done particularly well, scoring higher than 97% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 271,807 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 95% of its contemporaries.
We're also able to compare this research output to 19 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 94% of its contemporaries.