Summary of Zack: Zero-overhead Llm Inference Acceleration Via Dimensionality Compression Of the Key-value Cache, by Zeyu Zhang et al.
ZACK: Zero-Overhead LLM Inference Acceleration via Dimensionality Compression of the Key-Value Cacheby Zeyu Zhang, Haiying…