Inference – Page 13 – GrooveSquid.com

July 13, 2025

SpecHub: Provable Acceleration to Multi-Draft Speculative Decodingby Ryan Sun, Tianyi Zhou, Xun Chen, Lichao SunFirst…

July 13, 2025

The Evolution of RWKV: Advancements in Efficient Language Modelingby Akul DattaFirst submitted to arxiv on:…

July 13, 2025

Fast and Memory-Efficient Video Diffusion Using Streamlined Inferenceby Zheng Zhan, Yushu Wu, Yifan Gong, Zichong…

July 13, 2025

Average Controlled and Average Natural Micro Direct Effects in Summary Causal Graphsby Simon Ferreira, Charles…

July 13, 2025

VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Accelerationby Dezhan Tu, Danylo…

July 13, 2025

Kernel Looping: Eliminating Synchronization Boundaries for Peak Inference Performanceby David Koeplinger, Darshan Gandhi, Pushkar Nandkar,…

July 13, 2025

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketchingby Xinwang Chen, Ning Liu, Yichen…

July 13, 2025

YOLOv11 for Vehicle Detection: Advancements, Performance, and Applications in Intelligent Transportation Systemsby Mujadded Al Rabbani…

July 13, 2025

BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inferenceby Junqi Zhao,…

July 13, 2025

Teaching a Language Model to Distinguish Between Similar Details using a Small Adversarial Training Setby…