Summary of Open Source Language Models Can Provide Feedback: Evaluating Llms’ Ability to Help Students Using Gpt-4-as-a-judge, by Charles Koutcheme et al.

Open Source Language Models Can Provide Feedback: Evaluating LLMs’ Ability to Help Students Using GPT-4-As-A-Judge

by Charles Koutcheme, Nicola Dainese, Sami Sarsa, Arto Hellas, Juho Leinonen, Paul Denny

First submitted to arxiv on: 8 May 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Large language models (LLMs) have shown potential for generating automatic feedback in various computing contexts. However, concerns about privacy and ethics have sparked interest in open-source LLMs in education, but the quality of their generated feedback remains understudied. This is a concern as flawed or misleading feedback could negatively impact student learning. Inspired by recent work using powerful LLMs to evaluate less powerful models, we conduct an automated analysis of several open-source model feedback on a dataset from an introductory programming course. We investigate GPT-4’s viability as an automated evaluator and find it demonstrates bias toward positively rating feedback while showing moderate agreement with human raters, highlighting its potential as a feedback evaluator. Additionally, we explore the quality of feedback generated by leading open-source LLMs using GPT-4 evaluation, finding some models offer competitive performance with proprietary LLMs like ChatGPT, indicating opportunities for responsible use in educational settings.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Large language models can help computers generate feedback automatically. But people are worried about keeping student work private and ethical issues. This makes open-source models interesting for education, but we don’t know how well they do. If the feedback is wrong, it could hurt students’ learning. We looked at how well GPT-4 evaluates feedback from different models. It’s biased to think good things about feedback, but agrees with people most of the time. Some open-source models work as well as famous proprietary ones like ChatGPT.

Keywords

* Artificial intelligence * Gpt

Open Source Language Models Can Provide Feedback: Evaluating LLMs’ Ability to Help Students Using GPT-4-As-A-Judge

by Charles Koutcheme, Nicola Dainese, Sami Sarsa, Arto Hellas, Juho Leinonen, Paul Denny

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Delve Into Base-novel Confusion: Redundancy Exploration For Few-shot Class-incremental Learning, by Haichen Zhou et al.

Summary of From Human Judgements to Predictive Models: Unravelling Acceptability in Code-mixed Sentences, by Prashant Kodali et al.

Related Posts