Summary of Towards Measuring Fairness in Speech Recognition: Fair-speech Dataset, by Irina-elena Veliche et al.
Towards measuring fairness in speech recognition: Fair-Speech dataset
by Irina-Elena Veliche, Zhuangqun Huang, Vineeth Ayyat Kochaniyan, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer
First submitted to arxiv on: 22 Aug 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper addresses a crucial gap in the current public datasets for automatic speech recognition (ASR), which often overlook fairness aspects such as performance disparities across different demographic groups. To bridge this gap, researchers introduce Fair-Speech, a novel publicly released corpus designed to evaluate ASR models’ accuracy across diverse demographics, including age, gender, ethnicity, geographic variation, and native English speaker status. The dataset comprises approximately 26.5K utterances recorded by 593 individuals in the United States, who were compensated for recording and submitting audio clips of themselves saying voice commands. Additionally, the paper provides ASR baselines based on models trained on transcribed and untranscribed social media videos as well as open-source models. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper creates a new dataset to help machines better understand people’s voices. Right now, many public datasets for speech recognition don’t consider fairness, like how well machines perform for different groups of people. The researchers want to change this by introducing the Fair-Speech dataset, which has over 26,000 recordings from 593 people in the United States. These recordings are diverse and include people of different ages, genders, ethnicities, geographic locations, and native English speaker status. The goal is to help machines better recognize voices across these groups. |