Summary of Jamba-1.5: Hybrid Transformer-mamba Models at Scale, by Jamba Team: Barak Lenz et al.

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

by Jamba Team, Barak Lenz, Alan Arazi, Amir Bergman, Avshalom Manevich, Barak Peleg, Ben Aviram, Chen Almagor, Clara Fridman, Dan Padnos, Daniel Gissin, Daniel Jannai, Dor Muhlgay, Dor Zimberg, Edden M Gerber, Elad Dolev, Eran Krakovsky, Erez Safahi, Erez Schwartz, Gal Cohen, Gal Shachaf, Haim Rozenblum, Hofit Bata, Ido Blass, Inbal Magar, Itay Dalmedigos, Jhonathan Osin, Julie Fadlon, Maria Rozman, Matan Danos, Michael Gokhman, Mor Zusman, Naama Gidron, Nir Ratner, Noam Gat, Noam Rozen, Oded Fried, Ohad Leshno, Omer Antverg, Omri Abend, Opher Lieber, Or Dagan, Orit Cohavi, Raz Alon, Ro’i Belson, Roi Cohen, Rom Gilad, Roman Glozman, Shahar Lev, Shaked Meirom, Tal Delbari, Tal Ness, Tomer Asida, Tom Ben Gal, Tom Braude, Uriya Pumerantz, Yehoshua Cohen, Yonatan Belinkov, Yuval Globerson, Yuval Peleg Levy, Yoav Shoham

First submitted to arxiv on: 22 Aug 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper introduces a new type of language model called Jamba-1.5, which combines elements of Transformer and Mamba architectures to achieve high throughput and low memory usage while maintaining or improving quality. The models are available in two sizes: Large with 94 billion active parameters and Mini with 12 billion active parameters. Both sizes can process long contexts up to 256K tokens, making them suitable for conversational and instruction-following tasks. To support efficient inference, the authors propose ExpertsInt8, a novel quantization technique that enables processing on machines with limited GPU resources without compromising quality. The models are evaluated on various benchmarks, demonstrating excellent results and outperforming other open-weight models on long-context tasks.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper creates new language models called Jamba-1.5 that can understand and respond to long conversations. These models are better at processing large amounts of text than others like them. They’re available in two sizes: a big one with lots of information and a smaller one that’s still really smart. The authors also came up with a new way to make these models work on regular computers, not just super powerful ones. When they tested the models, they did really well on tasks like answering questions and following instructions.

Keywords

* Artificial intelligence * Inference * Language model * Quantization * Transformer

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Pruning by Explaining Revisited: Optimizing Attribution Methods to Prune Cnns and Transformers, By Sayed Mohammad Vakilzadeh Hatefi et al.

Summary of A Percolation Model Of Emergence: Analyzing Transformers Trained on a Formal Language, by Ekdeep Singh Lubana et al.

Related Posts