Summary of Wisdom Of Committee: Distilling From Foundation Model to Specialized Application Model, by Zichang Liu et al.
Wisdom of Committee: Distilling from Foundation Model to Specialized Application Model
by Zichang Liu, Qingyun Liu, Yuening Li, Liang Liu, Anshumali Shrivastava, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao
First submitted to arxiv on: 21 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Recent advancements in foundation models have led to impressive performance across various tasks. However, practitioners often develop specialized application models for specific applications, which are more efficient for serving. To leverage both types of models, knowledge transfer from foundation models to application models is a natural approach. Techniques like knowledge distillation can be applied, where the application model learns to mimic the foundation model. Despite having substantial gaps in capacity, using distinct architectures and different input features from various modalities, distillation methods face significant challenges. This work proposes creating a teaching committee consisting of foundation model teachers and complementary teachers that possess similar characteristics to bridge the gap between models. The introduction of DiverseDistill allows students to understand each teacher’s expertise and extract task knowledge. Evaluations demonstrate improved student performance when adding complementary teachers. Furthermore, DiverseDistill outperforms baseline distillation methods, resulting in significantly better student performance. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about how we can make computers learn from a general model that works well on many tasks and apply it to a specific task, like recognizing faces or translating languages. We want to do this because the general model might be too big and inefficient for the specific task. One way to achieve this is by teaching a smaller model to mimic the general model’s behavior. However, the two models are very different in terms of their architecture and what they learn from. This makes it challenging to transfer knowledge from one model to another. The authors propose creating a committee of teachers that includes both the general model and other models that are similar to the specific task we want to apply it to. This helps bridge the gap between the two models. They also introduce a new method called DiverseDistill, which allows the student model to learn from each teacher’s strengths and weaknesses. The results show that this approach improves the performance of the student model. |
Keywords
* Artificial intelligence * Distillation * Knowledge distillation * Student model