Summary of Workbench: a Benchmark Dataset For Agents in a Realistic Workplace Setting, by Olly Styles et al.
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Settingby Olly Styles, Sam Miller,…
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Settingby Olly Styles, Sam Miller,…
How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responsesby Jionghao Lin,…
From Complexity to Clarity: How AI Enhances Perceptions of Scientists and the Public’s Understanding of…
How Can I Improve? Using GPT to Highlight the Desired and Undesired Parts of Open-ended…
Automated Construction of Theme-specific Knowledge Graphsby Linyi Ding, Sizhe Zhou, Jinfeng Xiao, Jiawei HanFirst submitted…
TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domainsby Yoonsik Kim, Moonbin Yim, Ka…
Can Large Language Models put 2 and 2 together? Probing for Entailed Arithmetical Relationshipsby D.…
PatentGPT: A Large Language Model for Intellectual Propertyby Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun…
Ethical Reasoning and Moral Value Alignment of LLMs Depend on the Language we Prompt them…
UMass-BioNLP at MEDIQA-M3G 2024: DermPrompt – A Systematic Exploration of Prompt Engineering with GPT-4V for…