Summary of Contrastive Token Learning with Similarity Decay For Repetition Suppression in Machine Translation, by Huangyu Dai et al.
Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translation
by Huangyu Dai, Ben Chen, Kaidi Chen, Ying Han, Zihan Liang, Wen Jiang
First submitted to arxiv on: 30 Sep 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper investigates the issue of monotony and repetition in Neural Machine Translation (NMT) generated content, a crucial challenge for crosslingual conversation and trade. Traditional solutions have shown limited efficacy, particularly for lengthy texts with inherent redundancy. The authors attribute the phenomenon to elevated uncertainty within the input text and propose a novel algorithm called Contrastive Token Learning with Similarity Decay (CTSD). CTSD modulates token suppression dynamically, informed by attention weights and inter-token distances. The paper evaluates CTSD using an e-commerce dataset and shows significant improvements in precision and generalizability compared to existing approaches. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about a way to make machine translation better. Right now, translated texts can sound really repetitive and boring. Researchers have tried different ways to fix this problem, but it hasn’t worked very well. They think the issue comes from the confusing nature of some input texts. To solve this, they created a new method called CTSD that adjusts how much it repeats words based on their distance and importance. They tested this method using online product descriptions and showed it performs better than other methods. This could be useful for websites like Alibaba.com. |
Keywords
» Artificial intelligence » Attention » Precision » Token » Translation