Summary of Spechub: Provable Acceleration to Multi-draft Speculative Decoding, by Ryan Sun et al.
SpecHub: Provable Acceleration to Multi-Draft Speculative Decodingby Ryan Sun, Tianyi Zhou, Xun Chen, Lichao SunFirst…
SpecHub: Provable Acceleration to Multi-Draft Speculative Decodingby Ryan Sun, Tianyi Zhou, Xun Chen, Lichao SunFirst…
The Evolution of RWKV: Advancements in Efficient Language Modelingby Akul DattaFirst submitted to arxiv on:…
Fast and Memory-Efficient Video Diffusion Using Streamlined Inferenceby Zheng Zhan, Yushu Wu, Yifan Gong, Zichong…
Average Controlled and Average Natural Micro Direct Effects in Summary Causal Graphsby Simon Ferreira, Charles…
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Accelerationby Dezhan Tu, Danylo…
Kernel Looping: Eliminating Synchronization Boundaries for Peak Inference Performanceby David Koeplinger, Darshan Gandhi, Pushkar Nandkar,…
EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketchingby Xinwang Chen, Ning Liu, Yichen…
YOLOv11 for Vehicle Detection: Advancements, Performance, and Applications in Intelligent Transportation Systemsby Mujadded Al Rabbani…
BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inferenceby Junqi Zhao,…
Teaching a Language Model to Distinguish Between Similar Details using a Small Adversarial Training Setby…