Loading Now

Summary of Constructing the Cord-19 Vaccine Dataset, by Manisha Singh et al.


Constructing the CORD-19 Vaccine Dataset

by Manisha Singh, Divy Sharma, Alonso Ma, Bridget Tyree, Margaret Mitchell

First submitted to arxiv on: 26 Jul 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Information Retrieval (cs.IR); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper introduces a new dataset called CORD-19-Vaccination, specifically designed for scientists researching COVID-19 vaccines. The dataset is an augmentation of the original CORD-19 dataset, with added columns providing language details, author demographics, keywords, and topic information. Facebook’s fastText model identifies languages, while JSON file processing and Google’s search API determine author affiliations and countries. The Yake tool extracts keywords from paper titles, abstracts, and bodies, and the LDA algorithm adds topic information. To evaluate the dataset, the authors demonstrate a question-answering task similar to the CORD-19 Kaggle challenge. For further evaluation, they perform sequential sentence classification on each paper’s abstract using a pre-trained BERT-PubMed layer. The dataset contains 30k research papers and can be valuable for NLP research in text mining, information extraction, and question answering, focusing on COVID-19 vaccine research.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper creates a special database of articles about COVID-19 vaccines. It’s like a big library that scientists can use to find important information. They took existing data and added new details to make it more useful. The computer program fastText helps figure out which languages the authors are writing in, and they used other tools to learn more about who the authors are and where they work. They also looked for key words and topics in each article. To test their database, they did a task similar to one that already exists online. They’re sharing this big collection of articles so scientists can use it to learn more about vaccines.

Keywords

» Artificial intelligence  » Bert  » Classification  » Fasttext  » Nlp  » Question answering