Summary of Shroom-indelab at Semeval-2024 Task 6: Zero- and Few-shot Llm-based Classification For Hallucination Detection, by Bradley P. Allen and Fina Polat and Paul Groth
SHROOM-INDElab at SemEval-2024 Task 6: Zero- and Few-Shot LLM-Based Classification for Hallucination Detection
by Bradley P. Allen, Fina Polat, Paul Groth
First submitted to arxiv on: 4 Apr 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The SHROOM-INDElab system, developed by the University of Amsterdam Intelligent Data Engineering Lab team, participated in the SemEval-2024 Task 6 competition. This system builds upon previous work using prompt programming and in-context learning with large language models (LLMs) to create classifiers for hallucination detection. The system extends this work by incorporating context-specific definitions of tasks, roles, and target concepts, as well as automated generation of examples for use in a few-shot prompting approach. The SHROOM-INDElab system achieved fourth-best performance in the model-agnostic track and sixth-best performance in the model-aware tracks for Task 6. Evaluation using validation sets showed that the system’s classification decisions were consistent with those of crowd-sourced human labellers. Notably, a zero-shot approach provided better accuracy than a few-shot approach using automatically generated examples. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The University of Amsterdam team created a special computer program to help identify fake information online. They used large language models and some new techniques to make it work. The program did pretty well in a competition against other teams. It even worked as well as humans who looked at the same information! The researchers found that sometimes just using the model without any extra training was better than trying to teach it with more examples. |
Keywords
» Artificial intelligence » Classification » Few shot » Hallucination » Prompt » Prompting » Zero shot