Loading Now

Summary of Synfintabs: a Dataset Of Synthetic Financial Tables For Information and Table Extraction, by Ethan Bradley et al.


SynFinTabs: A Dataset of Synthetic Financial Tables for Information and Table Extraction

by Ethan Bradley, Muhammad Roman, Karen Rafferty, Barry Devereux

First submitted to arxiv on: 5 Dec 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper addresses the challenge of table extraction from document images in various content domains. Current datasets are limited due to the reliance on unreliable OCR for feature extraction. The authors propose SynFinTabs, a large-scale labelled dataset of synthetic financial tables, with the goal of creating a transferable method for other domains. To demonstrate the effectiveness of this dataset, they developed FinTabQA, a layout-based language model trained on an extractive question-answering task. The model is tested using real-world financial tables and compared to a state-of-the-art generative model. The authors make their dataset, model, and code publicly available.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine trying to find specific information in a table that’s hard to read from a scanned document. This paper solves this problem by creating a big library of fake tables with the correct answers. They want to help machines learn how to extract information from these kinds of tables, and make it work for different types of documents. To do this, they created a special kind of computer model that can answer questions based on the table’s layout. This model is tested using real-world financial tables and shows it can be very accurate.

Keywords

» Artificial intelligence  » Feature extraction  » Generative model  » Language model  » Question answering