Summary of Medcalc-bench: Evaluating Large Language Models For Medical Calculations, by Nikhil Khandekar et al.
MedCalc-Bench: Evaluating Large Language Models for Medical Calculationsby Nikhil Khandekar, Qiao Jin, Guangzhi Xiong, Soren…
MedCalc-Bench: Evaluating Large Language Models for Medical Calculationsby Nikhil Khandekar, Qiao Jin, Guangzhi Xiong, Soren…
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Modelby Yongting Zhang, Lu Chen,…
Grade Score: Quantifying LLM Performance in Option Selectionby Dmitri IourovitskiFirst submitted to arxiv on: 17…
WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying…
MEDeA: Multi-view Efficient Depth Adjustmentby Mikhail Artemyev, Anna Vorontsova, Anna Sokolova, Alexander LimonovFirst submitted to…
When Reasoning Meets Information Aggregation: A Case Study with Sports Narrativesby Yebowen Hu, Kaiqiang Song,…
Conformance Checking of Fuzzy Logs against Declarative Temporal Specificationsby Ivan Donadello, Paolo Felli, Craig Innes,…
Who’s asking? User personas and the mechanics of latent misalignmentby Asma Ghandeharioun, Ann Yuan, Marius…
DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Featuresby…
IDs for AI Systemsby Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de…