Loading Now

Summary of Crmsp: a Semi-supervised Approach For Key Information Extraction with Class-rebalancing and Merged Semantic Pseudo-labeling, by Qi Zhang et al.


CRMSP: A Semi-supervised Approach for Key Information Extraction with Class-Rebalancing and Merged Semantic Pseudo-Labeling

by Qi Zhang, Yonghong Song, Pengcheng Guo, Yangyang Hui

First submitted to arxiv on: 19 Jul 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed semi-supervised approach, Class-Rebalancing and Merged Semantic Pseudo-Labeling (CRMSP), addresses the challenges of underestimating confidence in long-tailed distributions and achieving intra-class compactness and inter-class separability. CRMSP consists of two modules: Class-Rebalancing Pseudo-Labeling (CRP) and Merged Semantic Pseudo-Labeling (MSP). CRP introduces a reweighting factor to rebalance pseudo-labels, increasing attention to tail classes. MSP clusters unlabeled data by assigning samples to Merged Prototypes (MP), utilizing a new contrastive loss designed specifically for this module. Experimental results on three benchmarks demonstrate state-of-the-art performance, achieving 3.24% f1-score improvement over the current state-of-the-art on the CORD dataset.
Low GrooveSquid.com (original content) Low Difficulty Summary
CRMSP is a new approach to key information extraction that uses semi-supervised learning to save time and money. This method helps by giving more attention to rare classes and making it easier to tell different classes apart. It works by using two special modules: one that adjusts how important each class is, and another that groups similar data points together. The results show that CRMSP does better than other methods on three well-known datasets.

Keywords

» Artificial intelligence  » Attention  » Contrastive loss  » F1 score  » Semi supervised