Summary of Tokenselect: Efficient Long-context Inference and Length Extrapolation For Llms Via Dynamic Token-level Kv Cache Selection, by Wei Wu et al.
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selectionby…