Loading Now

Summary of Enhancing Screen Time Identification in Children with a Multi-view Vision Language Model and Screen Time Tracker, by Xinlong Hou et al.


Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker

by Xinlong Hou, Sen Shen, Xueshen Li, Xinran Gao, Ziyi Huang, Steven J. Holiday, Matthew R. Cribbet, Susan W. White, Edward Sazonov, Yu Gan

First submitted to arxiv on: 2 Oct 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper presents a novel sensor informatics framework for accurately monitoring young children’s screen exposure, which is crucial for understanding phenomena linked to screen use such as childhood obesity, physical activity, and social interaction. The proposed framework utilizes egocentric images from a wearable sensor, the Screen Time Tracker (STT), and a Vision Language Model (VLM). A multi-view VLM was designed to interpret screen exposure dynamically by processing multiple views from image sequences. Experimental results on a dataset of children’s free-living activities demonstrate significant improvements over existing methods using plain vision language models and object detection models, highlighting the potential of this approach for optimizing behavioral research on screen exposure in naturalistic settings.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about creating a better way to measure how much time kids spend looking at screens. Right now, most studies rely on parents or researchers counting how long kids look at their phones or tablets, which isn’t very accurate. To fix this, the authors developed a new system that uses pictures taken by a special device worn by the child. These pictures are then analyzed using a special kind of computer program called a Vision Language Model. The model is trained to recognize different views and scenes from the pictures, and it can figure out how much time the child spends looking at screens. The authors tested their system with real data from kids doing everyday activities and found that it was way more accurate than other methods. This could be really helpful for researchers who want to understand how screen use affects kids’ lives.

Keywords

» Artificial intelligence  » Language model  » Object detection