Summary of Buster: a “business Transaction Entity Recognition” Dataset, by Andrea Zugarini and Andrew Zamai and Marco Ernandes and Leonardo Rigutini
BUSTER: a “BUSiness Transaction Entity Recognition” dataset
by Andrea Zugarini, Andrew Zamai, Marco Ernandes, Leonardo Rigutini
First submitted to arxiv on: 15 Feb 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper tackles the challenge of transferring Natural Language Processing (NLP) advancements into practical business applications, particularly in vertical domains like finance, law, and health. The issue lies in the disparity between popular benchmarks and real-world data, which often features lack of supervision, unbalanced classes, noisy data, and long documents. To address this, researchers present BUSTER, a Business Transaction Entity Recognition dataset comprising 3779 manually annotated financial transaction documents. They establish various baselines using both general-purpose and domain-specific language models, with the best performing model used to automatically annotate an additional 6196 documents, released as the silver corpus BUSTER. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about making artificial intelligence work better for real-life problems in areas like money, law, and health. Right now, there’s a gap between how AI is tested in labs and what really happens in these fields. The problem is that real-world data often has issues like missing labels, uneven classes, noisy information, or long documents. To help fix this, the researchers created BUSTER, a special dataset for recognizing business transactions. It contains 3,779 documents with manual annotations. They also tested several AI models on this data and used one of them to automatically label another 6,196 documents, which they released as more training data. |
Keywords
* Artificial intelligence * Natural language processing * Nlp