Summary of Harnessing Your Dram and Ssd For Sustainable and Accessible Llm Inference with Mixed-precision and Multi-level Caching, by Jie Peng et al.
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level…