Summary of New Benchmark Dataset and Fine-grained Cross-modal Fusion Framework For Vietnamese Multimodal Aspect-category Sentiment Analysis, by Quy Hoang Nguyen et al.
New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis
by Quy Hoang Nguyen, Minh-Van Truong Nguyen, Kiet Van Nguyen
First submitted to arxiv on: 1 May 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Fine-Grained Cross-Modal Fusion Framework (FCMF) is a novel approach to Aspect-Category Sentiment Analysis (ACSA) that effectively learns intra- and inter-modality interactions from multimodal data. To evaluate this framework, the authors introduce ViMACSA, a Vietnamese multimodal dataset consisting of 4,876 text-image pairs with fine-grained annotations for both text and image in the hotel domain. The FCMF outperforms state-of-the-art models on ViMACSA, achieving an F1 score of 79.73%. This work also explores characteristics and challenges in Vietnamese multimodal sentiment analysis, including misspellings, abbreviations, and language complexities. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper introduces a new dataset and framework for Aspect-Category Sentiment Analysis (ACSA) on social media platforms. The goal is to better understand user sentiments toward specific topics or “aspects”. To do this, the authors created a Vietnamese multimodal dataset called ViMACSA, which includes text-image pairs from the hotel domain. They also proposed a new framework that combines information from both text and images. This framework outperforms other models on the ViMACSA dataset. The paper highlights the challenges of working with Vietnamese language data, including misspellings and abbreviations. |
Keywords
» Artificial intelligence » F1 score