Summary of Limitations in Employing Natural Language Supervision For Sensor-based Human Activity Recognition — and Ways to Overcome Them, by Harish Haresamudram et al.

Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition – And Ways to Overcome Them

by Harish Haresamudram, Apoorva Beedu, Mashfiqui Rabbi, Sankalita Saha, Irfan Essa, Thomas Ploetz

First submitted to arxiv on: 21 Aug 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Cross-modal contrastive pre-training has shown impressive results across various tasks and domains by leveraging natural language supervision from modalities like vision and audio. This paper investigates whether similar approaches can be applied to wearable sensor-based Human Activity Recognition (HAR), discovering that surprisingly, it performs substantially worse than standard end-to-end training and self-supervision. The primary causes of this disparity are attributed to sensor heterogeneity and the lack of rich text descriptions of activities. To mitigate these issues, strategies were developed and evaluated through an extensive experimental evaluation, leading to significant increases in activity recognition and enabling the detection of unseen activities and cross-modal video retrieval. Overall, this work paves the way for better sensor-language learning, ultimately contributing to the development of foundational models for HAR using wearables.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper explores how computers can recognize human activities like walking or running from wearable sensors. It finds that a common method called cross-modal contrastive pre-training doesn’t work as well as expected for this task. The reason is that there isn’t enough rich language data to help the computer understand what different activities look like. To fix this, the researchers developed new strategies and tested them, showing significant improvement in activity recognition. This work can lead to better computers that can learn from wearables and recognize human activities.

Keywords

* Artificial intelligence * Activity recognition

Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition – And Ways to Overcome Them

by Harish Haresamudram, Apoorva Beedu, Mashfiqui Rabbi, Sankalita Saha, Irfan Essa, Thomas Ploetz

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Valuing An Engagement Surface Using a Large Scale Dynamic Causal Model, by Abhimanyu Mukerji et al.

Summary of Risk Analysis in Customer Relationship Management Via Quantile Region Convolutional Neural Network-long Short-term Memory and Cross-attention Mechanism, by Yaowen Huang and Jun Der Leu and Baoli Lu and Yan Zhou

Related Posts