Summary of Improving Autoregressive Training with Dynamic Oracles, by Jianing Yang et al.
Improving Autoregressive Training with Dynamic Oracles
by Jianing Yang, Harshine Visvanathan, Yilin Wang, Xinyi Hu, Matthew Gormley
First submitted to arxiv on: 13 Jun 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper addresses a common issue in Natural Language Processing (NLP) tasks that involve sequential decision-making, such as sequence tagging or text generation. The standard training methods used for these tasks, including maximum likelihood and scheduled sampling, suffer from exposure bias and mismatch between training and inference metrics. To mitigate this problem, the paper introduces DAgger, a solution that requires a metric-specific dynamic oracle algorithm. However, not all common metrics like span-based F1, ROUGE, and BLEU have existing dynamic oracles. In this work, the authors develop novel dynamic oracles for decomposable metrics like span-based F1, demonstrating their effectiveness in maintaining DAgger’s no-regret guarantee. The paper evaluates the algorithm’s performance on named entity recognition (NER), text summarization, and machine translation (MT) tasks. While results vary across tasks, DAgger with dynamic oracle outperforms baseline techniques in NER and text summarization. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research helps computers make better decisions when working with language. Right now, there are problems with how we train these computer models to understand and generate text. The authors of this paper have a solution called DAgger that makes the training process more accurate. But, they needed to create special tools (called “dynamic oracles”) to use DAgger correctly. They developed these tools for certain types of metrics used in language tasks like recognizing names, summarizing texts, and translating languages. The results show that their solution works well for some tasks, but not all. It’s an important step forward in making computers better at understanding human language. |
Keywords
» Artificial intelligence » Bleu » Inference » Likelihood » Named entity recognition » Natural language processing » Ner » Nlp » Rouge » Summarization » Text generation » Translation