Loading Now

Summary of Improving the Language Understanding Capabilities Of Large Language Models Using Reinforcement Learning, by Bokai Hu et al.


Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning

by Bokai Hu, Sai Ashish Somayajula, Xin Pan, Zihan Huang, Pengtao Xie

First submitted to arxiv on: 14 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper focuses on improving the natural language understanding (NLU) abilities of large language models (LLMs), specifically decoder-only transformers. The authors explore two approaches: supervised fine-tuning (SFT) and proximal policy optimization (PPO). To reduce the computational cost, they integrate low-rank adaptation (LoRA) layers that update only specific parts of the model during both SFT and PPO. The results show that while LLMs still underperform compared to models like BERT-base on some NLU tasks, PPO can close this gap by treating each token generation as an action and maximizing rewards based on alignment with ground-truth answers. The experiments demonstrate a 6.3-point gain over SFT on GLUE, surpassing zero-shot and few-shot performance. PPO also outperforms BERT-large on both GLUE and SuperGLUE.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making language models better at understanding human language. Language models are good at generating text, but they struggle when it comes to understanding what the text means. The authors try two ways to make them better: fine-tuning the model with a little bit of new data and using a special technique called proximal policy optimization (PPO). PPO helps the model learn to generate text that is more similar to what humans would write. The results show that this approach can really improve the model’s understanding abilities, especially when compared to other models like BERT.

Keywords

» Artificial intelligence  » Alignment  » Bert  » Decoder  » Few shot  » Fine tuning  » Language understanding  » Lora  » Low rank adaptation  » Optimization  » Supervised  » Token  » Zero shot