Loading Now

Summary of Combinatorial Rising Bandit, by Seockbean Song et al.


Combinatorial Rising Bandit

by Seockbean Song, Youngsik Yoon, Siwei Wang, Wei Chen, Jungseul Ok

First submitted to arxiv on: 1 Dec 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Combinatorial online learning is a crucial problem-solving task that determines the optimal combination of base arms in sequential interactions with uncertain rewards, applicable to various domains like robotics, social advertising, network routing, and recommendation systems. The rising rewards scenario is common, where selecting a base arm not only provides immediate reward but also enhances future rewards through practice or social influence. To address this, we introduce the combinatorial rising bandit problem to minimize policy regret and propose CRUCB (Combinatorial Rising Upper Confidence Bound), a provably efficient algorithm with a close-to-optimal regret upper bound. In contrast, previous studies lack sub-linear regret lower bounds, making it difficult to assess their efficiency. However, we provide a sub-linear regret lower bound for combinatorial rising bandit and demonstrate CRUCB’s effectiveness and superiority in both synthetic and realistic applications of deep reinforcement learning.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about finding the best combination of choices when interacting with uncertain systems that give rewards. This problem appears in many areas, such as robots getting better through practice or social media recommending products based on what people liked before. The key challenge is that the best choice not only gives an immediate reward but also helps get better future rewards. To solve this, we created a new algorithm called CRUCB that can efficiently find the best combination and does better than previous methods.

Keywords

» Artificial intelligence  » Online learning  » Reinforcement learning