Summary of When to Accept Automated Predictions and When to Defer to Human Judgment?, by Daniel Sikar et al.
When to Accept Automated Predictions and When to Defer to Human Judgment?
by Daniel Sikar, Artur Garcez, Tillman Weyde, Robin Bloomfield, Kaleem Peeroo
First submitted to arxiv on: 10 Jul 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes a novel approach for measuring the reliability of neural network predictions under distribution shifts in machine learning. The authors analyze how the outputs of a trained neural network change using clustering to measure distances between outputs and class centroids, proposing this distance as a metric to evaluate the confidence of predictions. They define a safety threshold for a class as the smallest distance from an incorrect prediction to the given class centroid. The approach is evaluated on the MNIST and CIFAR-10 datasets using a Convolutional Neural Network and a Vision Transformer, respectively. The results show that the proposed metric can offer an efficient way of determining when automated predictions are acceptable and when they should be deferred to human operators. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper tries to solve a big problem in artificial intelligence. When machines make decisions, we want to know if they’re reliable or not. Sometimes, the data used to train these machines changes, which can lead to bad decisions. The authors came up with a new way to measure how good the decisions are by looking at the distances between what the machine says and what it should say. They tested this method on two types of images (handwritten numbers and animals) using different kinds of neural networks. Their results show that this method works across different datasets and machines, which is important for keeping us safe from mistakes. |
Keywords
» Artificial intelligence » Clustering » Machine learning » Neural network » Vision transformer