Summary of Reinforcement Learning with Token-level Feedback For Controllable Text Generation, by Wendi Li et al.
Reinforcement Learning with Token-level Feedback for Controllable Text Generation
by Wendi Li, Wei Wei, Kaihe Xu, Wenfeng Xie, Dangyang Chen, Yu Cheng
First submitted to arxiv on: 18 Mar 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a novel reinforcement learning algorithm, called TOLE (TOken-LEvel rewards), to control generations of large language models (LLMs) for real-world applications. The existing methods suffer from overfitting issues or semantic collapse. TOLE formulates token-level rewards and employs a “first-quantize-then-noise” paradigm to enhance robustness. Experimental results show that the algorithm achieves superior performance on single-attribute and multi-attribute control tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper proposes an algorithm called TOLE to help computers generate text while following certain rules or guidelines. This is important because computers can sometimes generate text that doesn’t make sense or follows the wrong tone. The researchers tried different methods, but they had some problems like overfitting (when a model becomes too specialized) and semantic collapse (when a model forgets what it’s supposed to do). TOLE uses a new way of giving rewards to computers based on individual words, which helps them learn better. This approach also makes the algorithm more robust to changes in rules or guidelines. |
Keywords
* Artificial intelligence * Overfitting * Reinforcement learning * Token