Summary of Llm-select: Feature Selection with Large Language Models, by Daniel P. Jeong et al.

LLM-Select: Feature Selection with Large Language Models

by Daniel P. Jeong, Zachary C. Lipton, Pradeep Ravikumar

First submitted to arxiv on: 2 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper demonstrates an unexpected capability of large language models (LLMs): given only input feature names and a task description, they can select the most predictive features with performance rivaling standard data science tools. The models exhibit this capacity across various query mechanisms, including zero-shot prompting for numerical importance scores. The latest models, such as GPT-4, consistently identify the most predictive features regardless of the query mechanism and prompting strategy. Extensive experiments on real-world data show that LLM-based feature selection achieves strong performance competitive with data-driven methods like LASSO, despite never having seen the downstream training data. This suggests that LLMs may be useful not only for selecting features for training but also for deciding which features to collect in the first place.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper shows that big language models can do something really cool: they can figure out which pieces of information are most important without seeing any actual data. They just need to know what kind of question is being asked and what the answer might look like. The best models, like GPT-4, can even tell us how important each piece of information is. This could be really helpful for people who collect data in fields like healthcare, where it’s expensive and time-consuming.

Keywords

* Artificial intelligence * Feature selection * Gpt * Prompting * Zero shot

LLM-Select: Feature Selection with Large Language Models

by Daniel P. Jeong, Zachary C. Lipton, Pradeep Ravikumar

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Accelerating Distributed Optimization: a Primal-dual Perspective on Local Steps, by Junchi Yang et al.

Summary of A Simple Algorithm For Output Range Analysis For Deep Neural Networks, by Helder Rojas et al.

Related Posts