Loading Now

Summary of Towards Child-inclusive Clinical Video Understanding For Autism Spectrum Disorder, by Aditya Kommineni et al.


Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder

by Aditya Kommineni, Digbalay Bose, Tiantian Feng, So Hyun Kim, Helen Tager-Flusberg, Somer Bishop, Catherine Lord, Sudarsana Kadiri, Shrikanth Narayanan

First submitted to arxiv on: 20 Sep 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates the use of foundation models across three modalities – speech, video, and text – to analyze child-focused interaction sessions for children with Autism Spectrum Disorder. The authors propose a unified methodology to combine multiple modalities by using large language models as reasoning agents. They evaluate their performance on two tasks: activity recognition and abnormal behavior detection. The multimodal pipeline shows robustness to modality-specific limitations and improves performance on clinical video analysis compared to unimodal settings.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps doctors and researchers understand children with Autism Spectrum Disorder better by using computers to analyze videos of kids interacting with caregivers or professionals. Right now, people have to watch these long videos and write down what they see – which is time-consuming and requires special knowledge. The authors want to use powerful computer models that can learn from big datasets to help analyze these interactions. They test their approach on two tasks: recognizing what’s happening in the video (like “the child is playing”) and detecting unusual behavior. By combining different types of data, like speech, text, and video, they show that computers can do a better job than just looking at one type of data.

Keywords

* Artificial intelligence  * Activity recognition