Loading Now

Summary of Wonderbread: a Benchmark For Evaluating Multimodal Foundation Models on Business Process Management Tasks, by Michael Wornow et al.


WONDERBREAD: A Benchmark for Evaluating Multimodal Foundation Models on Business Process Management Tasks

by Michael Wornow, Avanika Narayan, Ben Viggiano, Ishan S. Khare, Tathagat Verma, Tibor Thompson, Miguel Angel Fuentes Hernandez, Sudharsan Sundar, Chloe Trujillo, Krrish Chawla, Rongfei Lu, Justin Shen, Divya Nagaraj, Joshua Martinez, Vardhan Agrawal, Althea Hudson, Nigam H. Shah, Christopher Re

First submitted to arxiv on: 19 Jun 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Machine Learning (cs.LG); Software Engineering (cs.SE)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Existing benchmarks in machine learning (ML) lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. The focus has been almost exclusively on full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4, ignoring the reality that most BPM tools are applied today by simply documenting the relevant workflow. WONDERBREAD is the first benchmark for evaluating multimodal FMs on BPM tasks beyond automation, addressing this gap. The contributions include: a dataset containing 2928 documented workflow demonstrations; 6 novel BPM tasks sourced from real-world applications ranging from workflow documentation to knowledge transfer to process improvement; and an automated evaluation harness. WONDERBREAD shows that while state-of-the-art FMs can automatically generate documentation (e.g., recalling 88% of the steps taken in a video demonstration of a workflow), they struggle to re-apply that knowledge towards finer-grained validation of workflow completion (F1 < 0.3). The goal is to encourage “human-centered” AI tooling for enterprise applications and explore multimodal FMs for broader BPM tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making computer models better at helping people with business processes. Business process management (BPM) is like managing a company’s workflow, but right now, most research focuses on using AI to automate everything. This isn’t how people usually use BPM tools – they mostly just document the steps in a process. To fix this, the authors created WONDERBREAD, a new way to test computer models that go beyond just automating things. They made a big dataset with 2928 examples of workflows and six different tasks, like documenting a workflow or transferring knowledge. The results show that even good AI models struggle to make sure a workflow is complete correctly. The authors hope this will encourage making more “human-centered” AI tools for businesses.

Keywords

» Artificial intelligence  » Gpt  » Machine learning