Summary of A Survey on Large Language Model Acceleration Based on Kv Cache Management, by Haoyang Li et al.
A Survey on Large Language Model Acceleration based on KV Cache Managementby Haoyang Li, Yiming…
A Survey on Large Language Model Acceleration based on KV Cache Managementby Haoyang Li, Yiming…
MBQ: Modality-Balanced Quantization for Large Vision-Language Modelsby Shiyao Li, Yingchun Hu, Xuefei Ning, Xihui Liu,…
Progressive Fine-to-Coarse Reconstruction for Accurate Low-Bit Post-Training Quantization in Vision Transformersby Rui Ding, Liang Yong,…
Efficient Quantization-Aware Training on Segment Anything Model in Medical Images and Its Deploymentby Haisheng Lu,…
Compression for Better: A General and Stable Lossless Compression Frameworkby Boyang Zhang, Daning Cheng, Yunquan…
QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videosby Sharath Girish, Tianye Li,…
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generationby Liao Qu, Huichao Zhang, Yiheng Liu,…
Scaling Image Tokenizers with Grouped Spherical Quantizationby Jiangtao Wang, Zhen Qin, Yifan Zhang, Vincent Tao…
Scalable Image Tokenization with Index Backpropagation Quantizationby Fengyuan Shi, Zhuoyan Luo, Yixiao Ge, Yujiu Yang,…
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Headsby Siqi Kou, Jiachun Jin, Chang Liu, Ye…