Summary of Simpo: Simple Preference Optimization with a Reference-free Reward, by Yu Meng et al.
SimPO: Simple Preference Optimization with a Reference-Free Rewardby Yu Meng, Mengzhou Xia, Danqi ChenFirst submitted…
SimPO: Simple Preference Optimization with a Reference-Free Rewardby Yu Meng, Mengzhou Xia, Danqi ChenFirst submitted…
Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergenceby Minheng Xiao, Xian Yu,…
Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimensionby…
Similarity-Navigated Conformal Prediction for Graph Neural Networksby Jianqing Song, Jianguo Huang, Wenyu Jiang, Baoming Zhang,…
Diffusion models for Gaussian distributions: Exact solutions and Wasserstein errorsby Emile Pierret, Bruno GalerneFirst submitted…
A Uniform Concentration Inequality for Kernel-Based Two-Sample Statisticsby Yijin Ni, Xiaoming HuoFirst submitted to arxiv…
Probabilistic Inference in the Era of Tensor Networks and Differential Programmingby Martin Roa-Villescas, Xuanzhao Gao,…
Learning Latent Space Hierarchical EBM Diffusion Modelsby Jiali Cui, Tian HanFirst submitted to arxiv on:…
Next-token prediction capacity: general upper bounds and a lower bound for transformersby Liam Madden, Curtis…
Accelerated Evaluation of Ollivier-Ricci Curvature Lower Bounds: Bridging Theory and Computationby Wonwoo Kang, Heehyun ParkFirst…