Summary of Crmsp: a Semi-supervised Approach For Key Information Extraction with Class-rebalancing and Merged Semantic Pseudo-labeling, by Qi Zhang et al.
CRMSP: A Semi-supervised Approach for Key Information Extraction with Class-Rebalancing and Merged Semantic Pseudo-Labeling
by Qi Zhang, Yonghong Song, Pengcheng Guo, Yangyang Hui
First submitted to arxiv on: 19 Jul 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed semi-supervised approach, Class-Rebalancing and Merged Semantic Pseudo-Labeling (CRMSP), addresses the challenges of underestimating confidence in long-tailed distributions and achieving intra-class compactness and inter-class separability. CRMSP consists of two modules: Class-Rebalancing Pseudo-Labeling (CRP) and Merged Semantic Pseudo-Labeling (MSP). CRP introduces a reweighting factor to rebalance pseudo-labels, increasing attention to tail classes. MSP clusters unlabeled data by assigning samples to Merged Prototypes (MP), utilizing a new contrastive loss designed specifically for this module. Experimental results on three benchmarks demonstrate state-of-the-art performance, achieving 3.24% f1-score improvement over the current state-of-the-art on the CORD dataset. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary CRMSP is a new approach to key information extraction that uses semi-supervised learning to save time and money. This method helps by giving more attention to rare classes and making it easier to tell different classes apart. It works by using two special modules: one that adjusts how important each class is, and another that groups similar data points together. The results show that CRMSP does better than other methods on three well-known datasets. |
Keywords
» Artificial intelligence » Attention » Contrastive loss » F1 score » Semi supervised