Loading Now

Summary of Fusion Approaches For Emotion Recognition From Speech Using Acoustic and Text-based Features, by Leonardo Pepino and Pablo Riera and Luciana Ferrer and Agustin Gravano


Fusion approaches for emotion recognition from speech using acoustic and text-based features

by Leonardo Pepino, Pablo Riera, Luciana Ferrer, Agustin Gravano

First submitted to arxiv on: 27 Mar 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Sound (cs.SD); Audio and Speech Processing (eess.AS)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A machine learning study proposes using contextualized word embeddings with BERT to improve emotion classification from speech. The approach combines acoustic and text-based features, demonstrating better performance than traditional methods like Glove embeddings. The paper evaluates different fusion strategies on IEMOCAP and MSP-PODCAST datasets, finding that combining modalities is beneficial but subtle differences exist between approaches. Additionally, the study highlights the importance of cross-validation folds in emotion classification tasks, cautioning against overestimating model performance.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research explores ways to recognize emotions in spoken language. Scientists developed a new way to represent words in speech transcriptions using BERT, which improved results. They also tested different methods for combining audio and text information and found that mixing the two is helpful. The study used datasets from IEMOCAP and MSP-PODCAST to test the approach.

Keywords

* Artificial intelligence  * Bert  * Classification  * Glove  * Machine learning