Summary of Mobilellm: Optimizing Sub-billion Parameter Language Models For On-device Use Cases, by Zechun Liu et al.
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
by Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra
First submitted to arxiv on: 22 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This research paper proposes efficient large language models (LLMs) for mobile devices, tackling issues with cloud costs and latency. The study focuses on designing top-quality LLMs with fewer than a billion parameters, suitable for mobile deployment. Contrary to prevailing beliefs, the investigation highlights the importance of model architecture in determining sub-billion scale LLM quality. Leveraging deep and thin architectures, along with embedding sharing and grouped-query attention mechanisms, the researchers develop a strong baseline network (MobileLLM) that outperforms preceding state-of-the-art models by 2.7%/4.3%. The paper also proposes an immediate block-wise weight-sharing approach with minimal latency overhead, resulting in further accuracy enhancements. The MobileLLM model family demonstrates significant improvements on chat benchmarks and API calling tasks, showcasing the potential of small models for common on-device use cases. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research aims to make language models smaller and more efficient so they can run smoothly on mobile devices. Currently, these models are too big and slow for mobile use. The researchers tried different architectures and found that some work better than others when it comes to making the models smaller but still good at understanding language. They developed a new model called MobileLLM that does really well compared to other similar models. This could be useful for things like chatbots or virtual assistants on your phone. |
Keywords
* Artificial intelligence * Attention * Embedding