Summary of Ranking Llms by Compression, By Peijia Guo et al.
Ranking LLMs by compression
by Peijia Guo, Ziguang Li, Haibo Hu, Chao Huang, Ming Li, Rui Zhang
First submitted to arxiv on: 20 Jun 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
| Summary difficulty | Written by | Summary | 
|---|---|---|
| High | Paper authors | High Difficulty Summary Read the original abstract here | 
| Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel method for ranking large language models (LLMs) is proposed based on lossless data compression. The authors demonstrate the equivalence of compression length under arithmetic coding with cumulative negative log probabilities when using an LLM as a prior, effectively leveraging the pre-training phase to learn optimal coding lengths. This approach allows for the calculation of the evaluation metric compression ratio without actual compression, significantly reducing computational overhead. Five LLMs are used as priors for compression, and their performance is evaluated on challenging natural language processing tasks, including sentence completion, question answering, and coreference resolution. The results show a positive correlation between compression ratio and model performance, suggesting that compression ratio can serve as a general metric to evaluate LLMs. | 
| Low | GrooveSquid.com (original content) | Low Difficulty Summary Large language models (LLMs) are special computers that understand human language. Researchers have found a new way to measure how good these models are by comparing them to the way we compress information. It’s like taking a big book and trying to make it smaller while still keeping all the important information inside. The authors tested five different LLMs on some tricky language tasks, such as filling in the blanks of a sentence or answering questions about what was said earlier. They found that how well each model performed at these tasks is related to how well they can compress information. This means we might be able to use this new measurement to figure out which LLMs are the best for certain jobs. | 
Keywords
* Artificial intelligence * Coreference * Natural language processing * Question answering




