Summary of Paramanu-ganita: Can Small Math Language Models Rival with Large Language Models on Mathematical Reasoning?, by Mitodru Niyogi et al.
PARAMANU-GANITA: Can Small Math Language Models Rival with Large Language Models on Mathematical Reasoning?
by Mitodru Niyogi, Arnab Bhattacharya
First submitted to arxiv on: 22 Apr 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper explores the performance of small generative language models (SLMs) pre-trained from scratch using domain-specific tokenizers and Chain-of-Thought (CoT) instruction fine-tuning on mathematical reasoning tasks. The authors propose Paramanu-Ganita, a novel decoder-only SLM with 208 million parameters, and demonstrate its competitive performance compared to larger language models (LLMs). They also evaluate the model’s environmental sustainability and cost efficiency. The authors use a mixed mathematical corpus, including web pages, source code, textbooks, and mathematical lecture notes, for pre-training. They develop a math-specialized BPE tokenizer and fine-tune Paramanu-Ganita using CoT instructions on the MetaMathQA dataset. The results show that despite being 34 times smaller than 7B LLMs, Paramanu-Ganita outperforms generalist LLMs by approximately 30% points and math-specialized LLMs by 3-23% points in GSM8K test accuracy metric. The model also performs well on various benchmarks, including MATH, LogiQA, MMLU (high school and college levels), and competitive exams. The paper’s findings highlight the potential of small generative language models for mathematical reasoning tasks while being environmentally sustainable and cost-efficient. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Paramanu-Ganita is a new way to train language models that can do math problems really well. It’s like a special kind of brain that can understand and solve math problems, even if they’re tricky! To make this happen, the researchers used a big dataset of math texts, code, and questions from online forums. They also created a special tokenizer (like a dictionary) that only includes words related to math. Then, they fine-tuned Paramanu-Ganita on a specific dataset called MetaMathQA. This made it super good at doing math problems! The results are amazing: even though Paramanu-Ganita is much smaller than other language models, it can do math problems just as well or even better! It’s like having a superpower in your pocket. |
Keywords
» Artificial intelligence » Decoder » Fine tuning » Tokenizer