Loading Now

Summary of Fabulight-asd: Unveiling Speech Activity Via Body Language, by Hugo Carneiro and Stefan Wermter


FabuLight-ASD: Unveiling Speech Activity via Body Language

by Hugo Carneiro, Stefan Wermter

First submitted to arxiv on: 20 Nov 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper introduces FabuLight-ASD, an advanced active speaker detection (ASD) model that integrates facial, audio, and body pose information to enhance detection accuracy and robustness. Building upon the existing Light-ASD framework, FabuLight-ASD incorporates human pose data represented through skeleton graphs, which minimizes computational overhead. The model is tested on the Wilder Active Speaker Detection (WASD) dataset, renowned for reliable face and body bounding box annotations. FabuLight-ASD achieves an overall mean average precision (mAP) of 94.3%, outperforming Light-ASD with an mAP of 93.7%. The incorporation of body pose information shows a particularly advantageous impact, with notable improvements in mAP observed in scenarios with speech impairment, face occlusion, and human voice background noise. Efficiency analysis indicates only a modest increase in parameter count (27.3%) and multiply-accumulate operations (up to 2.4%). These findings validate the efficacy of FabuLight-ASD in enhancing ASD performance through the integration of body pose data.
Low GrooveSquid.com (original content) Low Difficulty Summary
FabuLight-ASD is a new way for computers to detect when someone is talking or not, using information from faces, voices, and body language. This helps with things like video conferencing and robots that can understand humans. The researchers built upon an earlier model called Light-ASD by adding more data about how people move their bodies. They tested it on a dataset with lots of examples and found that it was much better than the previous model. When there were problems with speech or faces, FabuLight-ASD did even better. It’s also pretty efficient and can be used in real-life situations.

Keywords

» Artificial intelligence  » Bounding box  » Mean average precision