Summary of Gesture2text: a Generalizable Decoder For Word-gesture Keyboards in Xr Through Trajectory Coarse Discretization and Pre-training, by Junxiao Shen et al.
Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-training
by Junxiao Shen, Khadija Khaldi, Enmin Zhou, Hemant Bhaskar Surale, Amy Karlson
First submitted to arxiv on: 8 Oct 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Graphics (cs.GR); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes a novel solution for decoding word-gesture trajectories into text using Extended Reality (XR) systems. The existing template-matching methods, such as SHARK^2, are prone to inaccuracies due to noisy trajectories. Neural-network-based decoders require extensive data and deep-learning expertise, making them challenging to implement. To address these limitations, the authors develop a generalizable neural decoder pre-trained on large-scale coarsely discretized word-gesture trajectories. This approach achieves high decoding accuracy (90.4%) across four diverse datasets, outperforming SHARK^2 by 37.2% and conventional neural decoders by 7.4%. The proposed Pre-trained Neural Decoder is compact (4 MB after quantization) and can operate in real-time (97 milliseconds on Quest 3), making it a valuable tool for XR applications. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper finds a way to make computers better at understanding hand gestures used with special keyboards in virtual or augmented reality. Right now, these systems are not very good at recognizing what we mean when we type with our hands. The problem is that everyone does things a little differently, so it’s hard for the computer to figure out what we want. Some methods are easy to use but don’t work well. Other methods require special training and expertise. To solve this, the authors created a new way to teach computers to recognize hand gestures that works well across different systems and is easy to use. |
Keywords
» Artificial intelligence » Decoder » Deep learning » Neural network » Quantization » Template matching