Loading Now

Summary of Spactor-t5: Pre-training T5 Models with Span Corruption and Replaced Token Detection, by Ke Ye et al.


SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

by Ke Ye, Heinrich Jiang, Afshin Rostamizadeh, Ayan Chakrabarti, Giulia DeSalvo, Jean-François Kagy, Lazaros Karydas, Gui Citovsky, Sanjiv Kumar

First submitted to arxiv on: 24 Jan 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed SpacTor training procedure combines span corruption (SC) and token replacement detection (RTD) objectives with a two-stage curriculum to optimize pre-training. This hybrid approach achieves the same downstream NLP task performance as standard SC pre-training, but with reduced iterations and FLOPs. Alternatively, given the same computing budget, SpacTor yields improved benchmark performance. The effectiveness of this method is tied to the two-stage pre-training schedule, which optimizes the hybrid objective initially before transitioning to SC loss. This approach has potential applications in large language model training.
Low GrooveSquid.com (original content) Low Difficulty Summary
SpacTor is a new way to train big language models that makes better use of information during training. Right now, training these models can be slow and inefficient. SpacTor combines two techniques to make the process more effective. It uses a special objective function that mixes up parts of sentences (span corruption) with another technique that detects when words are replaced (token replacement detection). This approach also involves a special schedule for training, which helps it work well. SpacTor was tested on different NLP tasks and showed similar results to the traditional way of training these models, but used less computing power.

Keywords

* Artificial intelligence  * Large language model  * Nlp  * Objective function  * Token