Loading Now

Summary of Leveraging Large Language Models in Code Question Answering: Baselines and Issues, by Georgy Andryushchenko et al.


Leveraging Large Language Models in Code Question Answering: Baselines and Issues

by Georgy Andryushchenko, Vladimir Ivanov, Vladimir Makharev, Elizaveta Tukhtina, Aidar Valeev

First submitted to arxiv on: 5 Nov 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a method for using large language models to answer questions about source code in Python. The approach involves fine-tuning a model on a unified dataset of questions and answers, with varying levels of preprocessing (no grammar correction, grammar correction, and summary generation). The authors analyze the model’s performance using metrics such as BLEU-4, BERTScore F1, BLEURT, and Exact Match, and report the results along with conclusions from manual error analysis. The study highlights the current challenges in the field, including poor-quality public datasets and the positive effect of grammar correction on training data. The findings can inform other researchers working to improve source code question-answering solutions.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about using special computer models to help people understand software code better. These models are trained on a big dataset of questions and answers about Python code, and then tested to see how well they work. The authors tried different ways of preparing the training data (no grammar correction, grammar correction, and adding summaries) and looked at how well the models performed using special metrics like BLEU-4 and Exact Match. They found that grammar correction helped improve the model’s performance, but there are still problems with the quality of public datasets in this area. This research can help other people working on similar projects to make better models.

Keywords

» Artificial intelligence  » Bleu  » Fine tuning  » Question answering