Loading Now

Summary of Blending Llms Into Cascaded Speech Translation: Kit’s Offline Speech Translation System For Iwslt 2024, by Sai Koneru et al.


Blending LLMs into Cascaded Speech Translation: KIT’s Offline Speech Translation System for IWSLT 2024

by Sai Koneru, Thai-Binh Nguyen, Ngoc-Quan Pham, Danni Liu, Zhaolin Li, Alexander Waibel, Jan Niehues

First submitted to arxiv on: 24 Jun 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper explores the potential of Large Language Models (LLMs) in various tasks such as Automatic Speech Recognition (ASR), Machine Translation (MT), and End-to-End Speech Translation (ST). The authors present a submission to the constrained + LLM track, integrating Mistral-7B into their system to enhance ASR and MT outputs. They refine ASR transcripts by fine-tuning the LLM and improve MT translations at the document level by leveraging both ASR and MT predictions. The results show an absolute improvement of 0.3% in Word Error Rate and 0.65% in COMET for the tst2019 test set. However, integrating LLM is not beneficial in challenging test sets with overlapping speakers and background noise due to poor ASR performance.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper looks at how Large Language Models (LLMs) can help with things like recognizing spoken words, translating languages, and even changing spoken language into written text. The researchers took a special model called Mistral-7B and added it to their system to make it better. They used this model to improve the accuracy of recognized speech and translated texts. When they tested it, they found that it worked really well for certain tasks, but not as well when there was background noise or multiple people talking at once.

Keywords

» Artificial intelligence  » Fine tuning  » Translation