Loading Now

Summary of Evaluating Robustness Of Llms on Crisis-related Microblogs Across Events, Information Types, and Linguistic Features, by Muhammad Imran et al.


by Muhammad Imran, Abdul Wahab Ziaullah, Kai Chen, Ferda Ofli

First submitted to arxiv on: 8 Dec 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates the performance of six Large Language Models (LLMs) in processing disaster-related social media data from real-world events. Unlike traditional supervised machine learning approaches, LLMs are shown to offer better generalizability. The study finds that GPT-4o and GPT-4 demonstrate improved performance across different disasters and information types. However, most LLMs struggle with flood-related data, show minimal improvement despite example provision, and face challenges identifying critical information categories like urgent requests and needs. Linguistic features are also examined to understand their impact on model performance, revealing vulnerabilities against certain features like typos. The paper provides benchmarking results for all events across zero- and few-shot settings, observing that proprietary models outperform open-source ones in all tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
This study looks at how well computer language models can process social media data during disasters. These models are trained to understand natural language and don’t require human supervision like traditional methods do. The researchers tested six of these models on real-world disaster data and found that some did better than others. They discovered that certain models were good at understanding information from different types of disasters, but struggled with flood-related data. The study also looked at how features like typos affect the models’ performance. Overall, the results show that these language models have room for improvement, especially when it comes to processing important information during emergency situations.

Keywords

» Artificial intelligence  » Few shot  » Gpt  » Machine learning  » Supervised