Summary of Famma: a Benchmark For Financial Domain Multilingual Multimodal Question Answering, by Siqiao Xue et al.
FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering
by Siqiao Xue, Tingting Chen, Fan Zhou, Qingyang Dai, Zhixuan Chu, Hongyuan Mei
First submitted to arxiv on: 6 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper introduces FAMMA, an open-source benchmark for financial multilingual multimodal question answering (QA), evaluating the abilities of multimodal large language models (MLLMs) in answering questions that require advanced financial knowledge and sophisticated reasoning. The benchmark consists of 1,758 meticulously collected question-answer pairs from university textbooks and exams, spanning 8 major subfields in finance. These questions are presented in a mixed format combining text and heterogeneous image types. State-of-the-art MLLMs like GPT-4o and Claude-35-Sonnet achieve only 42% accuracy on FAMMA, while proprietary counterparts perform better. The paper also explores GPT o1-style reasoning chains to enhance the models’ reasoning capabilities, which improves error correction. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary FAMMA is a new way to test how well machines can answer questions about finance. It’s like a big quiz with lots of questions and answers that are written in different languages, including English, Chinese, and French. The questions have pictures too, like charts and diagrams. Right now, the best machine models can only get 42% of the answers correct! That means they still need to learn a lot more. To help them improve, the paper shows how machines can use special reasoning skills to fix mistakes. |
Keywords
» Artificial intelligence » Claude » Gpt » Question answering