Summary of Multi-state-action Tokenisation in Decision Transformers For Multi-discrete Action Spaces, by Perusha Moodley et al.
Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Action Spaces
by Perusha Moodley, Pramod Kaushik, Dhillu Thambi, Mark Trovinger, Praveen Paruchuri, Xia Hong, Benjamin Rosman
First submitted to arxiv on: 1 Jul 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Decision Transformers struggle to perform on image-based environments with multi-discrete action spaces, despite enhanced architectures. Our proposed Multi-State Action Tokenisation (M-SAT) addresses this issue by tokenising actions at the individual level and incorporating auxiliary state information. This approach disentangles actions, improving interpretability and visibility within attention layers. We demonstrate M-SAT’s performance gains on challenging ViZDoom environments with multi-discrete action spaces, outperforming the baseline Decision Transformer without additional data or computational overheads. Surprisingly, removing positional encoding can even improve M-SAT’s performance in some cases. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Decision Transformers have trouble working with images and lots of different actions. We created a new way to do this called Multi-State Action Tokenisation (M-SAT). It helps by breaking down the actions into smaller parts and adding extra information about what’s happening. This makes it easier to understand what the model is doing and why. Our tests show that M-SAT works better than regular Decision Transformers in tricky situations, and it doesn’t need extra data or powerful computers. |
Keywords
* Artificial intelligence * Attention * Positional encoding * Transformer