Transformer – Page 96 – GrooveSquid.com

July 13, 2025

Summary of Llm-rank: a Graph Theoretical Approach to Pruning Large Language Models, by David Hoffmann et al.

LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Modelsby David Hoffmann, Kailash Budhathoki, Matthaeus…

July 13, 2025

Summary of Precipitation Nowcasting Using Diffusion Transformer with Causal Attention, by Chaorong Li et al.

Precipitation Nowcasting Using Diffusion Transformer with Causal Attentionby ChaoRong Li, XuDong Ling, YiLan Xue, Wenjie…

July 13, 2025

Summary of An Evolved Universal Transformer Memory, by Edoardo Cetin et al.

An Evolved Universal Transformer Memoryby Edoardo Cetin, Qi Sun, Tianyu Zhao, Yujin TangFirst submitted to…

July 13, 2025

Summary of Estimating the Probabilities Of Rare Outputs in Language Models, by Gabriel Wu et al.

Estimating the Probabilities of Rare Outputs in Language Modelsby Gabriel Wu, Jacob HiltonFirst submitted to…

July 13, 2025

Summary of Hypothesis Testing the Circuit Hypothesis in Llms, by Claudia Shi et al.

Hypothesis Testing the Circuit Hypothesis in LLMsby Claudia Shi, Nicolas Beltran-Velez, Achille Nazaret, Carolina Zheng,…

July 13, 2025

Summary of Aero: Softmax-only Llms For Efficient Private Inference, by Nandan Kumar Jha and Brandon Reagen

AERO: Softmax-Only LLMs for Efficient Private Inferenceby Nandan Kumar Jha, Brandon ReagenFirst submitted to arxiv…

July 13, 2025

Summary of Cliqueformer: Model-based Optimization with Structured Transformers, by Jakub Grudzien Kuba et al.

Cliqueformer: Model-Based Optimization with Structured Transformersby Jakub Grudzien Kuba, Pieter Abbeel, Sergey LevineFirst submitted to…

July 13, 2025

Summary of Context-scaling Versus Task-scaling in In-context Learning, by Amirhesam Abedsoltan et al.

Context-Scaling versus Task-Scaling in In-Context Learningby Amirhesam Abedsoltan, Adityanarayanan Radhakrishnan, Jingfeng Wu, Mikhail BelkinFirst submitted…

July 13, 2025

Summary of Recurformer: Not All Transformer Heads Need Self-attention, by Ruiqing Yan et al.

RecurFormer: Not All Transformer Heads Need Self-Attentionby Ruiqing Yan, Linghan Zheng, Xingbo Du, Han Zou,…

July 13, 2025

Summary of Tracking Universal Features Through Fine-tuning and Model Merging, by Niels Horn and Desmond Elliott

Tracking Universal Features Through Fine-Tuning and Model Mergingby Niels Horn, Desmond ElliottFirst submitted to arxiv…