Loading Now

Summary of Scaling Granite Code Models to 128k Context, by Matt Stallone et al.


Scaling Granite Code Models to 128K Context

by Matt Stallone, Vaibhav Saxena, Leonid Karlinsky, Bridget McGinn, Tim Bula, Mayank Mishra, Adriana Meza Soria, Gaoyuan Zhang, Aditya Prasad, Yikang Shen, Saptha Surendran, Shanmukha Guttula, Hima Patel, Parameswaran Selvam, Xuan-Hong Dang, Yan Koyfman, Atin Sood, Rogerio Feris, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda

First submitted to arxiv on: 18 Jul 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Computation and Language (cs.CL); Software Engineering (cs.SE)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper introduces a new type of machine learning model, called long-context Granite code models, which can process text contexts of up to 128K tokens in length. To achieve this, the authors developed a lightweight pretraining approach that gradually increases the frequency of RoPE (Recurrent Output Pointer Embedding) and uses repository-level file packing and length-upsampled long-context data. The team also released instruction-tuned models with long-context support, derived from finetuning the long context base models on permissively licensed short and long-context instruction-response pairs. Compared to the original short-context Granite code models, the new long-context models show significant improvements on tasks requiring long contexts without sacrificing performance on regular code completion benchmarks like HumanEval. The authors release their long-context Granite code models under an Apache 2.0 license for both research and commercial use.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about a way to make computers better at understanding really long pieces of text, like articles or books. Right now, computers can only understand short pieces of text, which is okay for things like answering simple questions, but not good enough for more complex tasks. The authors found a way to make their computer models, called Granite code models, able to understand longer texts by using something called pretraining and packing files in a special way. They also made new versions of the models that are even better at understanding long texts. These new models can do some tasks much better than before without getting worse at other tasks. The authors want to share their new models with others, so they’re releasing them under a special license.

Keywords

» Artificial intelligence  » Embedding  » Machine learning  » Pretraining