Summary of Openchemie: An Information Extraction Toolkit For Chemistry Literature, by Vincent Fan and Yujie Qian and Alex Wang and Amber Wang and Connor W. Coley and Regina Barzilay
OpenChemIE: An Information Extraction Toolkit For Chemistry Literature
by Vincent Fan, Yujie Qian, Alex Wang, Amber Wang, Connor W. Coley, Regina Barzilay
First submitted to arxiv on: 1 Apr 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL); Information Retrieval (cs.IR)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Information extraction from chemistry literature is crucial for constructing up-to-date reaction databases. Existing work has mainly focused on extracting reactions from single modalities (text, tables, or figures). This paper presents OpenChemIE to address the complex challenge of extracting reaction data at the document level. OpenChemIE employs specialized neural models to extract relevant information from individual modalities and then integrates the results using chemistry-informed algorithms. The models attain state-of-the-art performance when evaluated individually, and the pipeline achieves an F1 score of 69.5% on a challenging dataset. Additionally, OpenChemIE’s reaction extraction results attain an accuracy score of 64.3% compared to the Reaxys chemical database. This open-source package provides a web interface for public use. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine having access to all the information about chemical reactions from scientific papers! This is important because it helps us understand and predict how chemicals will react with each other. The problem is that most of this information is hidden in different forms like text, tables, and pictures. This paper introduces a new tool called OpenChemIE that can extract this information from all these sources and combine it into one list. It uses special computer models to do this job accurately and efficiently. The tool performed well when tested on a challenging dataset and even compared favorably with a widely used chemical database. The researchers are making this tool available for anyone to use, which will help scientists and students work more effectively. |
Keywords
» Artificial intelligence » F1 score