Summary of Retrieval-augmented Mixture Of Lora Experts For Uploadable Machine Learning, by Ziyu Zhao et al.
Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learningby Ziyu Zhao, Leilei Gan, Guoyin Wang,…
Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learningby Ziyu Zhao, Leilei Gan, Guoyin Wang,…
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Treesby Yuhui Li, Fangyun Wei, Chao…
Flexible Tails for Normalizing Flowsby Tennessee Hickling, Dennis PrangleFirst submitted to arxiv on: 22 Jun…
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformersby Chao Lou,…
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Modelsby Sean Welleck, Amanda Bertsch, Matthew…
VICatMix: variational Bayesian clustering and variable selection for discrete biomedical databy Paul D. W. Kirk,…
ReCaLL: Membership Inference via Relative Conditional Log-Likelihoodsby Roy Xie, Junlin Wang, Ruomin Huang, Minxing Zhang,…
Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learningby Erin J. Talvitie, Zilei Shao, Huiying Li, Jinghan…
EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and…
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrationby…