Summary of Dem: Distribution Edited Model For Training with Mixed Data Distributions, by Dhananjay Ram et al.
DEM: Distribution Edited Model for Training with Mixed Data Distributions
by Dhananjay Ram, Aditya Rawal, Momchil Hardalov, Nikolaos Pappas, Sheng Zha
First submitted to arxiv on: 21 Jun 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper addresses the challenge of optimizing models trained on mixed data distributions, a crucial aspect of creating multi-task and instruction-following models. The diversity of data distributions and high cost of joint training make optimization procedures extremely challenging. Current data mixing methods partially address this issue but have sub-optimal performance across data sources and require multiple expensive training runs. The proposed Distribution Edited Model (DEM) combines individually trained models on each data source with a base model using basic vector operations, outperforming strong baselines on various benchmarks, including MMLU, BBH, DROP, MathQA, and HELM. DEM is 11x cheaper than standard data mixing and offers up to 16.1% improvement on DROP, making it a simple, efficient, and flexible solution for training with diverse data sources. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps solve a big problem in machine learning. When we have different types of data, it’s hard to make models that work well on all of them. The current solutions aren’t perfect and require a lot of computing power. The authors propose a new way to combine the individual models for each type of data with a base model. This new approach, called DEM, is faster and more effective than previous methods. It can improve performance by up to 16% on some tasks and doesn’t require re-training when adding or changing data sources. |
Keywords
» Artificial intelligence » Machine learning » Multi task » Optimization