Summary of Do “english” Named Entity Recognizers Work Well on Global Englishes?, by Alexander Shan et al.
Do “English” Named Entity Recognizers Work Well on Global Englishes?
by Alexander Shan, John Bauer, Riley Carlson, Christopher Manning
First submitted to arxiv on: 20 Apr 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper explores the limitations of popular English named entity recognition (NER) datasets in analyzing global varieties of English. It proposes a newswire dataset, the Worldwide English NER Dataset, to assess the performance of widely used NER toolkits and transformer models on low-resource English variants from around the world. The results show that models trained on commonly used British English or American-focused datasets experience significant performance drops when tested on the global dataset, with the greatest declines observed in Oceania and Africa. However, a combined model trained on the global dataset and either CoNLL or OntoNotes maintains strong performance on both test sets. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper is about how well computer models can understand different types of English words. Right now, most language models are only good at understanding American or British English, even though there are many other ways people speak English around the world. The researchers created a new dataset with news articles from all over the world to test how well these models do on global English varieties. They found that most models don’t do very well when tested on this new dataset, especially for languages like those spoken in Oceania and Africa. However, they were able to create a combined model that works well on both American and global English texts. |
Keywords
» Artificial intelligence » Named entity recognition » Ner » Transformer