Loading Now

Summary of Openscholar: Synthesizing Scientific Literature with Retrieval-augmented Lms, by Akari Asai et al.


OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

by Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D’arcy, David Wadden, Matt Latzke, Minyang Tian, Pan Ji, Shengyan Liu, Hao Tong, Bohao Wu, Yanyu Xiong, Luke Zettlemoyer, Graham Neubig, Dan Weld, Doug Downey, Wen-tau Yih, Pang Wei Koh, Hannaneh Hajishirzi

First submitted to arxiv on: 21 Nov 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Digital Libraries (cs.DL); Information Retrieval (cs.IR); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper introduces OpenScholar, a large language model that assists scientists in synthesizing literature by identifying relevant passages from 45 million open-access papers. The authors develop ScholarQABench, a benchmark for evaluating literature search, comprising expert-written queries and answers across four domains: computer science, physics, neuroscience, and biomedicine. They compare OpenScholar-8B to GPT-4o and PaperQA2, showing that OpenScholar outperforms them in correctness by 5% and 7%, respectively. Additionally, OpenScholar’s accuracy is comparable to human experts’ citation accuracy, whereas GPT4o hallucinates citations 78-90% of the time. The authors also demonstrate improvements when combining OpenScholar with off-the-shelf LMs like GPT-4o. In human evaluations, experts prefer OpenScholar responses over expert-written ones 51% and 70% of the time.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps scientists by using big computers to read and understand many research papers. The authors make a special computer called OpenScholar that can find important parts in these papers and write answers based on what it finds. They test OpenScholar with thousands of questions and compare it to other computers that do similar things. OpenScholar is better at getting the right answer than those other computers, and people who know a lot about science like its answers even more. The authors are sharing all their code and models so that others can use them too.

Keywords

» Artificial intelligence  » Gpt  » Large language model