Loading Now

Summary of Audio-visual Compound Expression Recognition Method Based on Late Modality Fusion and Rule-based Decision, by Elena Ryumina et al.


Audio-Visual Compound Expression Recognition Method based on Late Modality Fusion and Rule-based Decision

by Elena Ryumina, Maxim Markitantov, Dmitry Ryumin, Heysem Kaya, Alexey Karpov

First submitted to arxiv on: 19 Mar 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The SUN team presents their results on the Compound Expressions Recognition Challenge at the 6th ABAW Competition. They propose an innovative audio-visual method for recognizing compound expressions, which fuses modalities at the emotion probability level. This zero-shot classification task does not require any training data specific to the target task. The proposed method achieves a significant F1-score value of 22.01% on the C-EXPR-DB test subset. The paper demonstrates the potential for developing intelligent tools for annotating audio-visual data, enabling more accurate recognition of human emotions.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about recognizing complex expressions in sound and images. The team proposes a new way to do this by combining information from both modalities at an emotional level. This approach doesn’t require any special training for the specific task. The method achieves good results, showing that it could be used to develop tools that help annotate audio-visual data and better understand human emotions.

Keywords

* Artificial intelligence  * Classification  * F1 score  * Probability  * Zero shot