Summary of Evalyaks: Instruction Tuning Datasets and Lora Fine-tuned Models For Automated Scoring Of Cefr B2 Speaking Assessment Transcripts, by Nicy Scaria et al.
EvalYaks: Instruction Tuning Datasets and LoRA Fine-tuned Models for Automated Scoring of CEFR B2 Speaking Assessment Transcripts
by Nicy Scaria, Silvester John Joseph Kennedy, Thomas Latinovich, Deepak Subramani
First submitted to arxiv on: 22 Aug 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper tackles the challenge of automating the evaluation of CEFR B2 English speaking assessments in e-learning environments. It evaluates the capabilities of leading Large Language Models (LLMs) to score candidate performances across various criteria in both global and India-specific contexts. The study creates a new synthetic conversational dataset with expert-validated, CEFR-aligned transcripts rated at different assessment scores. Additionally, new datasets are developed from the English Vocabulary Profile and CEFR-SP WikiAuto datasets. The authors then use these datasets to perform parameter-efficient instruction tuning of the Mistral Instruct 7B v0.2 model, developing a family of models called EvalYaks. These models demonstrate an average acceptable accuracy of 96%, with a degree of variation of 0.35 levels, outperforming other models by three times. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper is about using computers to help evaluate people’s English speaking skills in online learning environments. This can be a problem because it takes too long and is hard to scale when humans have to do the evaluation. The researchers tested some special computer programs called Large Language Models (LLMs) to see if they could do this job well. They made a new dataset with conversations that are rated by experts, which helps train the LLMs. They also developed new datasets from other sources and used them to improve the performance of one specific model, EvalYaks. The results show that these models can accurately evaluate English speaking skills, making it easier to conduct language proficiency tests online. |
Keywords
» Artificial intelligence » Instruction tuning » Online learning » Parameter efficient