GPT – Page 133 – GrooveSquid.com

July 13, 2025

Can Language Models Explain Their Own Classification Behavior?by Dane Sherburn, Bilal Chughtai, Owain EvansFirst submitted…

July 13, 2025

MedConceptsQA: Open Source Medical Concepts QA Benchmarkby Ofir Ben Shoham, Nadav RappoportFirst submitted to arxiv…

July 13, 2025

Automating Code Adaptation for MLOps – A Benchmarking Study on LLMsby Harsh Patel, Buvaneswari A.…

July 13, 2025

Reddit-Impacts: A Named Entity Recognition Dataset for Analyzing Clinical and Social Effects of Substance Use…

July 13, 2025

NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Promptsby Shudan Zhang, Hanlin Zhao,…

July 13, 2025

Evaluating Text Summaries Generated by Large Language Models Using OpenAI’s GPTby Hassan Shakil, Atqiya Munawara…

July 13, 2025

Utilizing GPT to Enhance Text Summarization: A Strategy to Minimize Hallucinationsby Hassan Shakil, Zeydy Ortiz,…

July 13, 2025

Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Framesby Keith Burghardt, Kai Chen,…

July 13, 2025

Anchored Answers: Unravelling Positional Bias in GPT-2’s Multiple-Choice Questionsby Ruizhe Li, Yanjun GaoFirst submitted to…

July 13, 2025

Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Modelsby Tobias Groot, Matias…