Summary of Mmlu-sr: a Benchmark For Stress-testing Reasoning Capability Of Large Language Models, by Wentian Wang et al.
MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Modelsby Wentian Wang, Sarthak Jain,…
MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Modelsby Wentian Wang, Sarthak Jain,…
Tri-VQA: Triangular Reasoning Medical Visual Question Answering for Multi-Attribute Analysisby Lin Fan, Xun Gong, Cenyang…
Understanding Finetuning for Factual Knowledge Extractionby Gaurav Ghosal, Tatsunori Hashimoto, Aditi RaghunathanFirst submitted to arxiv…
TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language…
SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languagesby Gayane Ghazaryan, Erik Arakelyan, Pasquale Minervini,…
Temporal Knowledge Graph Question Answering: A Surveyby Miao Su, Zixuan Li, Zhuo Chen, Long Bai,…
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generationby Jirui Qi, Gabriele Sarti, Raquel Fernández, Arianna…
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering…
GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasksby Ihor Stepanov, Mykhailo ShtopkoFirst submitted…
Towards Understanding Domain Adapted Sentence Embeddings for Document Retrievalby Sujoy Roychowdhury, Sumit Soman, H. G.…