Summary of Backward Compatibility in Attributive Explanation and Enhanced Model Training Method, by Ryuta Matsuno
Backward Compatibility in Attributive Explanation and Enhanced Model Training Method
by Ryuta Matsuno
First submitted to arxiv on: 5 Aug 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Model update is a crucial process in ML/AI systems, enhancing average prediction performance but also impacting explanations. This paper introduces BCX, a metric evaluating backward compatibility of feature attribution explanations between pre- and post-update models. BCX utilizes practical agreement metrics to calculate the average agreement between explanations on samples where both models accurately predict. The paper also proposes BCXR, a model training method that theoretically lowers bounds agreement scores. A universal variant of BCXR improves all agreement metrics using L2 distance among explanations. Experimental results on eight real-world datasets demonstrate that BCXR achieves superior trade-offs between predictive performances and BCX scores. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Model update is important for ML/AI systems because it helps models make better predictions. But updating a model can also change how the model explains its predictions. This paper tries to solve this problem by creating a new way to measure how well explanations stay the same after a model is updated. They call this new metric BCX. The authors also propose a new way to train models that takes into account how well explanations will be preserved. They test their methods on eight real-world datasets and show that they can improve both prediction performance and explanation quality. |