Summary of Transformers Provably Learn Sparse Token Selection While Fully-connected Nets Cannot, by Zixuan Wang et al.
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannotby Zixuan Wang, Stanley Wei, Daniel…
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannotby Zixuan Wang, Stanley Wei, Daniel…
Unleashing the Denoising Capability of Diffusion Prior for Solving Inverse Problemsby Jiawei Zhang, Jiaxin Zhuang,…
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizesby Dan Qiao,…
Latent Diffusion Model-Enabled Low-Latency Semantic Communication in the Presence of Semantic Ambiguities and Wireless Channel…
Differentiable Combinatorial Scheduling at Scaleby Mingju Liu, Yingjie Li, Jiaqi Yin, Zhiru Zhang, Cunxi YuFirst…
An Improved Empirical Fisher Approximation for Natural Gradient Descentby Xiaodong Wu, Wenyi Yu, Chao Zhang,…
Computational and Statistical Guarantees for Tensor-on-Tensor Regression with Tensor Train Decompositionby Zhen Qin, Zhihui ZhuFirst…
Symmetric Matrix Completion with ReLU Samplingby Huikang Liu, Peng Wang, Longxiu Huang, Qing Qu, Laura…
Adversarial flows: A gradient flow characterization of adversarial attacksby Lukas Weigand, Tim Roith, Martin BurgerFirst…
Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizesby Si Yi Meng,…