Summary of Jamba-1.5: Hybrid Transformer-mamba Models at Scale, by Jamba Team: Barak Lenz et al.
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
by Jamba Team, Barak Lenz, Alan Arazi, Amir Bergman, Avshalom Manevich, Barak Peleg, Ben Aviram, Chen Almagor, Clara Fridman, Dan Padnos, Daniel Gissin, Daniel Jannai, Dor Muhlgay, Dor Zimberg, Edden M Gerber, Elad Dolev, Eran Krakovsky, Erez Safahi, Erez Schwartz, Gal Cohen, Gal Shachaf, Haim Rozenblum, Hofit Bata, Ido Blass, Inbal Magar, Itay Dalmedigos, Jhonathan Osin, Julie Fadlon, Maria Rozman, Matan Danos, Michael Gokhman, Mor Zusman, Naama Gidron, Nir Ratner, Noam Gat, Noam Rozen, Oded Fried, Ohad Leshno, Omer Antverg, Omri Abend, Opher Lieber, Or Dagan, Orit Cohavi, Raz Alon, Ro’i Belson, Roi Cohen, Rom Gilad, Roman Glozman, Shahar Lev, Shaked Meirom, Tal Delbari, Tal Ness, Tomer Asida, Tom Ben Gal, Tom Braude, Uriya Pumerantz, Yehoshua Cohen, Yonatan Belinkov, Yuval Globerson, Yuval Peleg Levy, Yoav Shoham
First submitted to arxiv on: 22 Aug 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper introduces a new type of language model called Jamba-1.5, which combines elements of Transformer and Mamba architectures to achieve high throughput and low memory usage while maintaining or improving quality. The models are available in two sizes: Large with 94 billion active parameters and Mini with 12 billion active parameters. Both sizes can process long contexts up to 256K tokens, making them suitable for conversational and instruction-following tasks. To support efficient inference, the authors propose ExpertsInt8, a novel quantization technique that enables processing on machines with limited GPU resources without compromising quality. The models are evaluated on various benchmarks, demonstrating excellent results and outperforming other open-weight models on long-context tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper creates new language models called Jamba-1.5 that can understand and respond to long conversations. These models are better at processing large amounts of text than others like them. They’re available in two sizes: a big one with lots of information and a smaller one that’s still really smart. The authors also came up with a new way to make these models work on regular computers, not just super powerful ones. When they tested the models, they did really well on tasks like answering questions and following instructions. |
Keywords
» Artificial intelligence » Inference » Language model » Quantization » Transformer