Token – Page 45 – GrooveSquid.com

Loading Now

July 13, 2025

Summary of In-context Learning and Occam’s Razor, by Eric Elmoznino et al.

In-context learning and Occam’s razorby Eric Elmoznino, Tom Marty, Tejas Kasetty, Leo Gagnon, Sarthak Mittal,…

July 13, 2025

Summary of Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens, by Lijie Fan et al.

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokensby Lijie Fan, Tianhong Li, Siyang Qin,…

July 13, 2025

Summary of Enhancing Generalization in Sparse Mixture Of Experts Models: the Case For Increased Expert Activation in Compositional Tasks, by Jinze Zhao

Enhancing Generalization in Sparse Mixture of Experts Models: The Case for Increased Expert Activation in…

July 13, 2025

Summary of Active-dormant Attention Heads: Mechanistically Demystifying Extreme-token Phenomena in Llms, by Tianyu Guo et al.

Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMsby Tianyu Guo, Druv Pai, Yu Bai,…

July 13, 2025

Summary of Boosting Imperceptibility Of Stable Diffusion-based Adversarial Examples Generation with Momentum, by Nashrah Haque et al.

Boosting Imperceptibility of Stable Diffusion-based Adversarial Examples Generation with Momentumby Nashrah Haque, Xiang Li, Zhehui…

July 13, 2025

Summary of In-context Kv-cache Eviction For Llms Via Attention-gate, by Zihao Zeng et al.

In-context KV-Cache Eviction for LLMs via Attention-Gateby Zihao Zeng, Bokai Lin, Tianqi Hou, Hao Zhang,…

July 13, 2025

Summary of Self-supervised Learning Of Disentangled Representations For Multivariate Time-series, by Ching Chang et al.

Self-Supervised Learning of Disentangled Representations for Multivariate Time-Seriesby Ching Chang, Chiao-Tung Chan, Wei-Yao Wang, Wen-Chih…

July 13, 2025

Summary of Moh: Multi-head Attention As Mixture-of-head Attention, by Peng Jin et al.

MoH: Multi-Head Attention as Mixture-of-Head Attentionby Peng Jin, Bo Zhu, Li Yuan, Shuicheng YanFirst submitted…

July 13, 2025

Summary of The Fair Language Model Paradox, by Andrea Pinto and Tomer Galanti and Randall Balestriero

The Fair Language Model Paradoxby Andrea Pinto, Tomer Galanti, Randall BalestrieroFirst submitted to arxiv on:…

July 13, 2025

Summary of Dyspec: Faster Speculative Decoding with Dynamic Token Tree Structure, by Yunfan Xiong et al.

DySpec: Faster Speculative Decoding with Dynamic Token Tree Structureby Yunfan Xiong, Ruoyu Zhang, Yanzeng Li,…