- 3/4 A huge issue is bit collisions. Fingerprints with a high bit occupation (RDKit, MAP4) often lead to (1) arbitrary misinterpretations, (2) shifts to high Tanimoto scores, (3) very different handling of small and large molecules. --> Consider using sparse fingerprints! --> Morgan >> MAP4 / RDKitJun 23, 2025 09:22
- 4/4 We also highlight options for count fingerprints, such as log-counts and IDF weighted counts. The latter can be used to adjust the bit importance to a dataset of your choice. An example use-case are chemical space visualizations. Preprint: www.biorxiv.org/content/10.1...