Summary of Reliable and Diverse Evaluation Of Llm Medical Knowledge Mastery, by Yuxuan Zhou et al.
Reliable and diverse evaluation of LLM medical knowledge masteryby Yuxuan Zhou, Xien Liu, Chen Ning,…
Reliable and diverse evaluation of LLM medical knowledge masteryby Yuxuan Zhou, Xien Liu, Chen Ning,…
UU-Mamba: Uncertainty-aware U-Mamba for Cardiovascular Segmentationby Ting Yu Tsai, Li Lin, Shu Hu, Connie W.…
LLMs are One-Shot URL Classifiers and Explainersby Fariza Rashid, Nishavi Ranaweera, Ben Doyle, Suranga SeneviratneFirst…
Do language models practice what they preach? Examining language ideologies about gendered language reform encoded…
Generative AI Carries Non-Democratic Biases and Stereotypes: Representation of Women, Black Individuals, Age Groups, and…
Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer…
LLM for Everyone: Representing the Underrepresented in Large Language Modelsby Samuel CahyawijayaFirst submitted to arxiv…
CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Databy Zhao Cheng, Diane Wan, Matthew…
Nonlinear Inverse Design of Mechanical Multi-Material Metamaterials Enabled by Video Denoising Diffusion and Structure Identifierby…
Measuring Error Alignment for Decision-Making Systemsby Binxia Xu, Antonis Bikakis, Daniel Onah, Andreas Vlachidis, Luke…