Summary of Archon: An Architecture Search Framework For Inference-time Techniques, by Jon Saad-falcon et al.

Archon: An Architecture Search Framework for Inference-Time Techniques

by Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Guha, E. Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Ré, Azalia Mirhoseini

First submitted to arxiv on: 23 Sep 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel framework, Archon, is introduced to optimize large language model (LLM) systems by combining and stacking inference-time techniques. This modular architecture leverages a diverse set of LLMs and techniques to create more effective models than individual components. Archon defines an extensible design space, encompassing various techniques such as generation ensembling and repeated sampling. It transforms the problem into a hyperparameter optimization objective, utilizing search techniques to discover optimized architectures for target benchmarks. The framework is evaluated across multiple instruction-following, reasoning, and coding benchmarks, outperforming frontier models by achieving an average accuracy increase of 15.1 percentage points.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Archon is a new way to make language models better. It takes many different models and techniques and combines them to create something even more powerful. This helps language models answer questions and complete tasks more accurately. The system uses a special design space that includes different techniques, like combining model outputs or trying different models. It then searches through this space to find the best combination of models and techniques for a specific task. Archon is tested on many different benchmarks and outperforms other state-of-the-art models.

Keywords

» Artificial intelligence » Hyperparameter » Inference » Large language model » Optimization

Archon: An Architecture Search Framework for Inference-Time Techniques

by Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Guha, E. Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Ré, Azalia Mirhoseini

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Data-driven Model Discovery with Kolmogorov-arnold Networks, by Mohammadamin Moradi et al.

Summary of Watch Your Steps: Observable and Modular Chains Of Thought, by Cassandra A. Cohen and William W. Cohen

Related Posts