Summary of Quantifying Prediction Consistency Under Model Multiplicity in Tabular Llms, by Faisal Hamman et al.
Quantifying Prediction Consistency Under Model Multiplicity in Tabular LLMs
by Faisal Hamman, Pasan Dissanayake, Saumitra Mishra, Freddy Lecue, Sanghamitra Dutta
First submitted to arxiv on: 4 Jul 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed paper formalizes the challenge of “fine-tuning multiplicity” in large language models (LLMs) when used for tabular classification tasks. This phenomenon arises from variations in the training process, leading to equally well-performing models making conflicting predictions on the same inputs. The authors propose a novel metric to quantify the robustness of individual predictions without expensive model retraining. This metric analyzes the local behavior of the model around the input in the embedding space and leverages Bernstein’s Inequality to provide probabilistic robustness guarantees against a broad class of fine-tuned models. Empirical evaluation on real-world datasets supports the theoretical results, highlighting the importance of addressing fine-tuning instabilities for trustworthy deployment in high-stakes applications. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper looks at a problem with big language models when they’re used to classify data from tables. Sometimes, different models can make the same predictions even if they were trained slightly differently. This makes it hard to trust what the model is saying. The authors come up with a new way to measure how sure we are that a model’s prediction is correct, without having to retrain the whole model again. They show that their method works well in practice and could be important for using these models in places where things matter a lot. |
Keywords
* Artificial intelligence * Classification * Embedding space * Fine tuning