Summary of Sort & Slice: a Simple and Superior Alternative to Hash-based Folding For Extended-connectivity Fingerprints, by Markus Dablander et al.
Sort & Slice: A Simple and Superior Alternative to Hash-Based Folding for Extended-Connectivity Fingerprints
by Markus Dablander, Thierry Hanser, Renaud Lambiotte, Garrett M. Morris
First submitted to arxiv on: 10 Mar 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Chemical Physics (physics.chem-ph); Biomolecules (q-bio.BM)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
| Summary difficulty | Written by | Summary |
|---|---|---|
| High | Paper authors | High Difficulty Summary Read the original abstract here |
| Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper introduces a new method for aggregating atom features learned by graph neural networks into compound-level representations using a mathematical framework called substructure pooling. The authors demonstrate that their approach, called Sort & Slice, outperforms traditional hash-based folding and other methods in predicting molecular properties. Sort & Slice sorts ECFP substructures by their prevalence in a set of training compounds and then selects the most frequent ones to generate a binary fingerprint. The authors compare the performance of different substructure-pooling techniques, including Sort & Slice, and find that it robustly outperforms others across various prediction tasks, data splitting techniques, machine-learning models, and ECFP hyperparameters. |
| Low | GrooveSquid.com (original content) | Low Difficulty Summary Sort & Slice is a new way to group together small pieces of chemical structures called ECFPs into bigger representations. This helps computers learn about chemicals and predict their properties. The usual way to do this uses a simple formula that assigns numbers to the ECFPs. But this new method, Sort & Slice, sorts the ECFPs by how often they appear in a set of training compounds, then selects the most common ones to create a binary fingerprint. This fingerprint can be used as input for machine-learning models to predict chemical properties. The authors compared Sort & Slice with other methods and found that it works better in many cases. |
Keywords
* Artificial intelligence * Machine learning




