Loading Now

Summary of Recurrentgemma: Moving Past Transformers For Efficient Open Language Models, by Aleksandar Botev et al.


RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

by Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz Gustavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Clément Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas

First submitted to arxiv on: 11 Apr 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper introduces RecurrentGemma, a family of open language models that utilizes Google’s Griffin architecture, which combines linear recurrences with local attention. The model features a fixed-sized state, reducing memory usage and enabling efficient inference on long sequences. Two sizes of models are provided, containing 2B and 9B parameters, along with pre-trained and instruction-tuned variants for both. The RecurrentGemma models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper introduces a new type of language model that uses an architecture called Griffin. This architecture is special because it combines two things: linear recurrences and local attention. This helps the model understand language really well. The model has a fixed size, which means it uses less memory and can work with long pieces of text efficiently. There are two sizes of models: one with 2 billion parameters and one with 9 billion parameters. Each model comes in two versions: one that is already trained and one that is trained to do specific tasks. Even though the new models were trained on less data, they perform just as well as similar-sized models.

Keywords

» Artificial intelligence  » Attention  » Inference  » Language model