Loading Now

Summary of Benchmarking Zero-shot Stance Detection with Flant5-xxl: Insights From Training Data, Prompting, and Decoding Strategies Into Its Near-sota Performance, by Rachith Aiyappa et al.


Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance

by Rachith Aiyappa, Shruthi Senthilmani, Jisun An, Haewoon Kwak, Yong-Yeol Ahn

First submitted to arxiv on: 1 Mar 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper investigates the performance of large language model (LLM)-based zero-shot stance detection on tweets using FlanT5-XXL. The authors study the effects of different prompts, decoding strategies, and potential biases on the model’s performance using three datasets from SemEval 2016 Tasks 6A, 6B, and P-Stance. The results show that the zero-shot approach can match or outperform state-of-the-art benchmarks, including fine-tuned models. The authors provide insights into the model’s sensitivity to instructions, decoding strategies, perplexity of prompts, and negations/oppositions present in prompts.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper looks at how well a special kind of AI model can figure out what people think about something by reading tweets without being taught beforehand. They used a really powerful language model called FlanT5-XXL to see if it could do this job as well or better than other models that were trained specifically for this task. The results showed that the zero-shot approach, where the AI doesn’t learn from any specific data, can be just as good or even better than the trained models. This paper helps us understand how these AI models work and what makes them so good at understanding people’s opinions.

Keywords

* Artificial intelligence  * Language model  * Large language model  * Perplexity  * Zero shot