Summary of Scaling Granite Code Models to 128k Context, by Matt Stallone et al.

Scaling Granite Code Models to 128K Context

by Matt Stallone, Vaibhav Saxena, Leonid Karlinsky, Bridget McGinn, Tim Bula, Mayank Mishra, Adriana Meza Soria, Gaoyuan Zhang, Aditya Prasad, Yikang Shen, Saptha Surendran, Shanmukha Guttula, Hima Patel, Parameswaran Selvam, Xuan-Hong Dang, Yan Koyfman, Atin Sood, Rogerio Feris, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda

First submitted to arxiv on: 18 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper introduces a new type of machine learning model, called long-context Granite code models, which can process text contexts of up to 128K tokens in length. To achieve this, the authors developed a lightweight pretraining approach that gradually increases the frequency of RoPE (Recurrent Output Pointer Embedding) and uses repository-level file packing and length-upsampled long-context data. The team also released instruction-tuned models with long-context support, derived from finetuning the long context base models on permissively licensed short and long-context instruction-response pairs. Compared to the original short-context Granite code models, the new long-context models show significant improvements on tasks requiring long contexts without sacrificing performance on regular code completion benchmarks like HumanEval. The authors release their long-context Granite code models under an Apache 2.0 license for both research and commercial use.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper is about a way to make computers better at understanding really long pieces of text, like articles or books. Right now, computers can only understand short pieces of text, which is okay for things like answering simple questions, but not good enough for more complex tasks. The authors found a way to make their computer models, called Granite code models, able to understand longer texts by using something called pretraining and packing files in a special way. They also made new versions of the models that are even better at understanding long texts. These new models can do some tasks much better than before without getting worse at other tasks. The authors want to share their new models with others, so they’re releasing them under a special license.

Keywords

* Artificial intelligence * Embedding * Machine learning * Pretraining

Scaling Granite Code Models to 128K Context

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Noder: Image Sequence Regression Based on Neural Ordinary Differential Equations, by Hao Bai et al.

Summary of Phi-3 Safety Post-training: Aligning Language Models with a “break-fix” Cycle, by Emman Haider et al.

Related Posts