Loading Now

Summary of Limitations in Employing Natural Language Supervision For Sensor-based Human Activity Recognition — and Ways to Overcome Them, by Harish Haresamudram et al.


Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition – And Ways to Overcome Them

by Harish Haresamudram, Apoorva Beedu, Mashfiqui Rabbi, Sankalita Saha, Irfan Essa, Thomas Ploetz

First submitted to arxiv on: 21 Aug 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Cross-modal contrastive pre-training has shown impressive results across various tasks and domains by leveraging natural language supervision from modalities like vision and audio. This paper investigates whether similar approaches can be applied to wearable sensor-based Human Activity Recognition (HAR), discovering that surprisingly, it performs substantially worse than standard end-to-end training and self-supervision. The primary causes of this disparity are attributed to sensor heterogeneity and the lack of rich text descriptions of activities. To mitigate these issues, strategies were developed and evaluated through an extensive experimental evaluation, leading to significant increases in activity recognition and enabling the detection of unseen activities and cross-modal video retrieval. Overall, this work paves the way for better sensor-language learning, ultimately contributing to the development of foundational models for HAR using wearables.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper explores how computers can recognize human activities like walking or running from wearable sensors. It finds that a common method called cross-modal contrastive pre-training doesn’t work as well as expected for this task. The reason is that there isn’t enough rich language data to help the computer understand what different activities look like. To fix this, the researchers developed new strategies and tested them, showing significant improvement in activity recognition. This work can lead to better computers that can learn from wearables and recognize human activities.

Keywords

* Artificial intelligence  * Activity recognition