Summary of Baton: Enhancing Batch-wise Inference Efficiency For Large Language Models Via Dynamic Re-batching, by Peizhuang Cong et al.
BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batchingby Peizhuang Cong, Qizhi…