Loading Now

Summary of Learning to Maximize Mutual Information For Chain-of-thought Distillation, by Xin Chen et al.


Learning to Maximize Mutual Information for Chain-of-Thought Distillation

by Xin Chen, Hanxian Huang, Yanjun Gao, Yi Wang, Jishen Zhao, Ke Ding

First submitted to arxiv on: 5 Mar 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Knowledge distillation, a crucial technique for efficient AI deployment, has seen significant advancements with the introduction of Distilling Step-by-Step (DSS), a novel method leveraging chain-of-thought (CoT) distillation. DSS allows smaller models to acquire superior reasoning capabilities from their larger counterparts by generating rationales and predicting labels concurrently through multi-task learning. However, this approach overlooks the intrinsic relationship between task training, leading to ineffective knowledge integration. This paper investigates the mutual relationship of tasks from an Information Bottleneck perspective, formulating it as maximizing the mutual information of representation features. A variational approach is proposed to solve this optimization problem using a learning-based method. Experimental results across four datasets demonstrate that our method outperforms state-of-the-art DSS, offering valuable insights for future research on language model distillation and CoT applications.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine taking a super smart AI model and teaching a smaller one how to think like it. This process is called knowledge distillation. Researchers have developed a new way to do this called Distilling Step-by-Step (DSS). It works by giving the small model two jobs: come up with reasons why something is true, and predict what category something belongs in. The problem is that these tasks are connected, but DSS doesn’t account for this. This paper figures out how to make DSS better by understanding how these tasks relate to each other. They tested it on four different sets of data and found that their new method works even better than the old one.

Keywords

» Artificial intelligence  » Distillation  » Knowledge distillation  » Language model  » Multi task  » Optimization