Loading Now

Summary of Learning Robust Named Entity Recognizers From Noisy Data with Retrieval Augmentation, by Chaoyi Ai et al.


Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation

by Chaoyi Ai, Yong Jiang, Shen Huang, Pengjun Xie, Kewei Tu

First submitted to arxiv on: 26 Jul 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a novel approach to named entity recognition (NER) models that struggle with noisy inputs, such as those containing spelling mistakes or errors generated by Optical Character Recognition processes. The proposed method retrieves relevant text from a knowledge corpus and concatenates it with the original noisy input, enhancing its representation using a transformer network. The authors design three retrieval methods: sparse retrieval based on lexicon similarity, dense retrieval based on semantic similarity, and self-retrieval based on task-specific text. They also employ a multi-view training framework that improves robust NER without retrieving text during inference. Experimental results show significant improvements in various noisy NER settings.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making computers better at recognizing important words and phrases, even when the input data has mistakes or errors. This can happen when old books are scanned and turned into digital files, for example. The problem is that most current methods need to know what the correct answer is (called “gold text”), which isn’t always available. So this paper proposes a new approach where computers can still improve their recognition abilities even without knowing the correct answers. They do this by finding relevant information from large databases and combining it with the noisy data, making it easier for computers to recognize important words and phrases.

Keywords

* Artificial intelligence  * Inference  * Named entity recognition  * Ner  * Transformer