Summary of Leveraging Offline Data in Linear Latent Bandits, by Chinmaya Kausik et al.
Leveraging Offline Data in Linear Latent Bandits
by Chinmaya Kausik, Kevin Tan, Ambuj Tewari
First submitted to arxiv on: 27 May 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A new paper introduces the latent bandit framework to sequential decision-making domains like recommender systems, healthcare, and education. This approach models unobserved heterogeneity in the population by treating an unobserved latent state as determining a trajectory model. The authors establish a de Finetti theorem for decision processes, showing that every exchangeable and coherent stateless decision process is a latent bandit. They focus on linear models with high-dimensional actions and unknown low-dimensional subspace latent states. The paper presents SOLD, a method to learn this subspace from short offline trajectories with guarantees, as well as two online methods: LOCAL-UCB and ProBALL-UCB. These methods offer regret guarantees, making them suitable for applications like movie recommendations. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary A new way of understanding how people make decisions is being explored in this paper! It’s called the “latent bandit” framework, and it helps us understand why different people might choose different things even when they’re given the same information. The researchers used this idea to create a special kind of model that can learn from small amounts of data. They also developed new ways for computers to make decisions based on these models, which could be useful in things like movie recommendations. |