Summary of Orso: Accelerating Reward Design Via Online Reward Selection and Policy Optimization, by Chen Bo Calvin Zhang et al.
ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
by Chen Bo Calvin Zhang, Zhang-Wei Hong, Aldo Pacchiano, Pulkit Agrawal
First submitted to arxiv on: 17 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Robotics (cs.RO)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Online Reward Selection and Policy Optimization (ORSO) framework addresses the challenge of efficiently selecting effective shaping rewards in reinforcement learning. By framing the selection process as an online model selection problem, ORSO automatically identifies high-performing shaping reward functions without human intervention, guaranteeing regret bounds. This approach is demonstrated across various continuous control tasks, showcasing superior data efficiency and reduced computational time (up to 8 times). Compared to prior methods, ORSO consistently identifies high-quality reward functions, outperforming them by more than 50%. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary ORSO helps with reinforcement learning by choosing the right rewards. This is important because most tasks only give us a little bit of feedback. The problem is that finding the best rewards takes a lot of time and data. ORSO solves this by using a special way to choose rewards online, without needing human help. This method works well across different kinds of control tasks. It’s faster and uses less data than other methods, and it finds good rewards more often. |
Keywords
* Artificial intelligence * Optimization * Reinforcement learning