Summary of Federated Document Visual Question Answering: a Pilot Study, by Khanh Nguyen and Dimosthenis Karatzas
Federated Document Visual Question Answering: A Pilot Study
by Khanh Nguyen, Dimosthenis Karatzas
First submitted to arxiv on: 10 May 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper addresses a limitation in document analysis research, where documents are often copyrighted or contain private information, making it difficult to create centralized datasets. To overcome this challenge, the authors propose using federated learning (FL) to train a shared model on decentralized private document data. The focus is on Document VQA (Visual Question Answering), which requires diverse reasoning capabilities across different domains. By enabling training over heterogeneous document datasets, DocVQA models can be substantially enriched. The authors assemble existing DocVQA datasets from diverse domains and explore self-pretraining techniques in a multi-modal setting. They also propose combining self-pretraining with federated DocVQA training using centralized adaptive optimization, which outperforms the FedAvg baseline. With extensive experiments, the paper presents a multi-faceted analysis on training DocVQA models with FL, providing insights for future research. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine you want to train computers to read and understand documents from different places around the world. But, these documents are often private or owned by someone else, making it hard to share them. The authors of this paper suggest using a special way to train computers called federated learning. This allows many computers to work together, even if they have different data, to create one shared model that can understand and answer questions about documents from anywhere. They also explore ways to make these models better by training them on multiple types of data at once. The paper shows that this approach is effective and could be useful for training computers to understand a wide range of documents. |
Keywords
» Artificial intelligence » Federated learning » Multi modal » Optimization » Pretraining » Question answering