Attention – Page 195 – GrooveSquid.com

Loading Now

July 13, 2025

Summary of Wkvquant: Quantizing Weight and Key/value Cache For Large Language Models Gains More, by Yuxuan Yue et al.

WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains Moreby Yuxuan Yue, Zhihang…

July 13, 2025

Summary of Aicattack: Adversarial Image Captioning Attack with Attention-based Optimization, by Jiyao Li et al.

AICAttack: Adversarial Image Captioning Attack with Attention-Based Optimizationby Jiyao Li, Mingze Ni, Yifei Dong, Tianqing…

July 13, 2025

Summary of What Evidence Do Language Models Find Convincing?, by Alexander Wan et al.

What Evidence Do Language Models Find Convincing?by Alexander Wan, Eric Wallace, Dan KleinFirst submitted to…

July 13, 2025

Summary of In-context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness, by Liam Collins et al.

In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitznessby Liam Collins, Advait Parulekar, Aryan…

July 13, 2025

Summary of Measuring and Controlling Instruction (in)stability in Language Model Dialogs, by Kenneth Li et al.

Measuring and Controlling Instruction (In)Stability in Language Model Dialogsby Kenneth Li, Tianle Liu, Naomi Bashkansky,…

July 13, 2025

Summary of Edgeqat: Entropy and Distribution Guided Quantization-aware Training For the Acceleration Of Lightweight Llms on the Edge, by Xuan Shen et al.

EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the…

July 13, 2025

Summary of An End-to-end Attention-based Approach For Learning on Graphs, by David Buterez et al.

An end-to-end attention-based approach for learning on graphsby David Buterez, Jon Paul Janet, Dino Oglic,…

July 13, 2025

Summary of Can Transformers Predict Vibrations?, by Fusataka Kuniyoshi et al.

Can Transformers Predict Vibrations?by Fusataka Kuniyoshi, Yoshihide SawadaFirst submitted to arxiv on: 16 Feb 2024CategoriesMain:…

July 13, 2025

Summary of Graph-based Forecasting with Missing Data Through Spatiotemporal Downsampling, by Ivan Marisca et al.

Graph-based Forecasting with Missing Data through Spatiotemporal Downsamplingby Ivan Marisca, Cesare Alippi, Filippo Maria BianchiFirst…

July 13, 2025

Summary of Samformer: Unlocking the Potential Of Transformers in Time Series Forecasting with Sharpness-aware Minimization and Channel-wise Attention, by Romain Ilbert et al.

SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise…