Summary of Digirl: Training In-the-wild Device-control Agents with Autonomous Reinforcement Learning, by Hao Bai et al.
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
by Hao Bai, Yifei Zhou, Mert Cemri, Jiayi Pan, Alane Suhr, Sergey Levine, Aviral Kumar
First submitted to arxiv on: 14 Jun 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Training corpuses for vision language models (VLMs) typically lack sufficient amounts of decision-centric data, rendering off-the-shelf VLMs sub-optimal for decision-making tasks like in-the-wild device control through graphical user interfaces (GUIs). This paper introduces DigiRL, a novel autonomous RL approach for training in-the-wild device control agents. The approach fine-tunes a pre-trained VLM in two stages: offline RL to initialize the model, followed by offline-to-online RL. A scalable and parallelizable Android learning environment is built, equipped with a VLM-based evaluator and an automatic curriculum for deriving maximal learning signal. Advantage-weighted RL with enhanced advantage estimators is used to account for stochasticity. The effectiveness of DigiRL is demonstrated using the Android-in-the-Wild (AitW) dataset, achieving a 49.5% absolute improvement in success rate from 17.7 to 67.2%. This surpasses prior best agents and approaches, establishing a new state-of-the-art for digital agents. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about training machines to control devices on their own. Right now, these machines don’t do very well because they don’t have enough information to make good decisions. The researchers developed a new way to train the machines by fine-tuning a pre-trained model in two stages. They built a special environment for testing and came up with a way to make the learning process more efficient. The results are impressive, showing that their approach is much better than previous methods. This could lead to huge advancements in machine learning and how we interact with devices. |
Keywords
» Artificial intelligence » Fine tuning » Machine learning