Summary of Daco: Towards Application-driven and Comprehensive Data Analysis Via Code Generation, by Xueqing Wu et al.
DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation
by Xueqing Wu, Rui Zheng, Jingzhen Sha, Te-Lin Wu, Hanyu Zhou, Mohan Tang, Kai-Wei Chang, Nanyun Peng, Haoran Huang
First submitted to arxiv on: 4 Mar 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes new resources and benchmarks to advance the field of data analysis for tabular data. The goal is to automatically generate high-quality answer annotations using large language models (LLMs) with a multi-turn prompting technique. A dataset called DACO is constructed, consisting of databases, query-answer pairs, and a test set with refined human annotations. The paper also trains a supervised fine-tuning model on the DACO dataset and finds that it learns reasonable data analysis capabilities. To improve the models’ alignment with human preferences, reinforcement learning is used to encourage generating helpful answers. The proposed algorithm, DACO-RL, is evaluated by human annotators and found to produce more helpful answers than the SFT model in 57.72% of cases. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper tries to help computers understand table data better. They make a special dataset called DACO with lots of different kinds of tables and questions about those tables. They also train a computer program to analyze this data and find good answers. The program is tested against human judges, and it does pretty well! This means that we can use computers to help us understand table data better. |
Keywords
» Artificial intelligence » Alignment » Fine tuning » Prompting » Reinforcement learning » Supervised