Summary of Exploring the Use Of a Large Language Model For Data Extraction in Systematic Reviews: a Rapid Feasibility Study, by Lena Schmidt et al.
Exploring the use of a Large Language Model for data extraction in systematic reviews: a rapid feasibility study
by Lena Schmidt, Kaitlyn Hair, Sergio Graziosi, Fiona Campbell, Claudia Kapp, Alireza Khanteymoori, Dawn Craig, Mark Engelbert, James Thomas
First submitted to arxiv on: 23 May 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper presents a feasibility study on using GPT-4, a large language model (LLM), to automate data extraction in systematic reviews. The authors conducted two studies during the 2023 Evidence Synthesis Hackathon: one to automatically extract study characteristics from various domains and another to predict Participants, Interventions, Controls and Outcomes (PICOs) labeled within abstracts in the EBM-NLP dataset. The results showed an accuracy of around 80%, with some variability between domains. Causal inference methods and study design were found to be the most challenging data extraction items. Evaluation was done manually, and scoring methods such as BLEU and ROUGE showed limited value. The authors observed variability in the LLM’s predictions and changes in response quality. This paper presents a template for future evaluations of LLMs in data extraction for systematic review automation, highlighting potential benefits but also cautioning against integrating models like GPT-4 into tools without further research on stability and reliability. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This study explores using a large language model (LLM) called GPT-4 to help with data extraction in scientific reviews. The researchers tried two things: one was to see if the LLM could extract important information from different types of studies, and the other was to use it to identify key details like what participants were involved, what interventions they received, and what outcomes resulted. They found that the LLM was pretty good at this, getting around 80% of the answers correct. However, there were some challenges, especially when dealing with more complex data like causal inference methods and study design. The researchers also discovered that the LLM’s responses varied depending on the type of study and even changed over time. This study shows that using an LLM like GPT-4 could be helpful for automating certain tasks in scientific reviews, but more research is needed to make sure it works well in real-world situations. |
Keywords
» Artificial intelligence » Bleu » Gpt » Inference » Large language model » Nlp » Rouge