Summary of Hare: Human Priors, a Key to Small Language Model Efficiency, by Lingyun Zhang et al.
HARE: HumAn pRiors, a key to small language model Efficiency
by Lingyun Zhang, Bin jin, Gaojian Ge, Lunhui Liu, Xuewen Shen, Mingyong Wu, Houqian Zhang, Yongneng Jiang, Shiqi Chen, Shi Pu
First submitted to arxiv on: 17 Jun 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper focuses on improving deep learning models by incorporating human prior knowledge. Large Language Models (LLMs) have shifted attention to scaling model size and data volume, overlooking human priors’ importance. Existing Small Language Models (SLMs) rely heavily on web-scraped large-scale training data, neglecting proper incorporation of human priors. This oversight hinders SLMs’ efficiency in resource-constrained settings. The authors propose a principle for leveraging human priors in data construction, emphasizing high-performance SLMs through concise datasets with semantic diversity and quality consistency while avoiding benchmark data leakage. They train an SLM named HARE-1.1B, demonstrating its favorability against state-of-the-art SLMs on large-scale benchmark datasets. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about making computer models better by using information that humans know. Right now, people are focusing on making these models bigger and training them with lots of data. However, this approach forgets to use important human knowledge that could help the models learn faster and more efficiently. The authors suggest a new way to construct datasets for small language models that combines human prior knowledge with high-quality data. They train a model named HARE-1.1B, which performs well on big benchmark tests. |
Keywords
» Artificial intelligence » Attention » Deep learning