Loading Now

Summary of No Free Lunch From Random Feature Ensembles, by Benjamin S. Ruben et al.


No Free Lunch From Random Feature Ensembles

by Benjamin S. Ruben, William L. Tong, Hamza Tahir Chaudhry, Cengiz Pehlevan

First submitted to arxiv on: 6 Dec 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This research paper investigates the trade-off between training a single large neural network or combining the predictions of many smaller networks. The authors focus on ensembles of random-feature ridge regression models, where they prove that a single model with optimally tuned parameters outperforms an ensemble of multiple models. They also derive scaling laws that describe how the test risk of an ensemble decays with its total size and identify conditions for achieving near-optimal performance. Experimental results show that a single large network outperforms any ensemble of networks with the same total number of parameters, provided optimal tuning is done.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper looks at what happens when you have a limited amount of computer power to train a model. You have to decide whether to use it all for one big model or split it up into smaller models that work together. The researchers studied this problem specifically for a type of model called random-feature ridge regression. They found that in most cases, having just one big model is better than having multiple small ones. They also came up with rules that explain how well an ensemble of models will do based on its size and the kind of task it’s trying to solve. To test this, they trained different types of models (neural networks and transformers) and found that a single large model usually outperforms many smaller ones.

Keywords

» Artificial intelligence  » Neural network  » Regression  » Scaling laws