Inference – Page 151 – GrooveSquid.com

July 13, 2025

Exploiting Student Parallelism for Efficient GPU Inference of BERT-like Models in Online Servicesby Weiyan Wang,…

July 13, 2025

Jamba-1.5: Hybrid Transformer-Mamba Models at Scaleby Jamba Team, Barak Lenz, Alan Arazi, Amir Bergman, Avshalom…

July 13, 2025

Smartphone-based Eye Tracking System using Edge Intelligence and Model Optimisationby Nishan Gunawardena, Gough Yumu Lui,…

July 13, 2025

Only Strict Saddles in the Energy Landscape of Predictive Coding Networks?by Francesco Innocenti, El Mehdi…

July 13, 2025

Critique-out-Loud Reward Modelsby Zachary Ankner, Mansheej Paul, Brandon Cui, Jonathan D. Chang, Prithviraj AmmanabroluFirst submitted…

July 13, 2025

FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Modelsby Zhongyu Zhao, Menghang Dong,…

July 13, 2025

MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Modelsby Elias Frantar, Roberto L. Castro, Jiale…

July 13, 2025

Sum of Squares Circuitsby Lorenzo Loconte, Stefan Mengel, Antonio VergariFirst submitted to arxiv on: 21…

July 13, 2025

Clinical Context-aware Radiology Report Generation from Medical Images using Transformersby Sonit SinghFirst submitted to arxiv…

July 13, 2025

Towards Probabilistic Inductive Logic Programming with Neurosymbolic Inference and Relaxationby Fieke Hillerstrom, Gertjan BurghoutsFirst submitted…