Summary of Entropy-regularized Process Reward Model, by Hanning Zhang et al.
Entropy-Regularized Process Reward Modelby Hanning Zhang, Pengcheng Wang, Shizhe Diao, Yong Lin, Rui Pan, Hanze…
Entropy-Regularized Process Reward Modelby Hanning Zhang, Pengcheng Wang, Shizhe Diao, Yong Lin, Rui Pan, Hanze…
DisCo-DSO: Coupling Discrete and Continuous Optimization for Efficient Generative Design in Hybrid Spacesby Jacob F.…
DUET: Dual Clustering Enhanced Multivariate Time Series Forecastingby Xiangfei Qiu, Xingjian Wu, Yan Lin, Chenjuan…
Fully Test-time Adaptation for Tabular Databy Zhi Zhou, Kun-Yang Yu, Lan-Zhe Guo, Yu-Feng LiFirst submitted…
Adaptive Quantization Resolution and Power Control for Federated Learning over Cell-free Networksby Afsaneh Mahmoudi, Emil…
Exploring Grokking: Experimental and Mechanistic Investigationsby Hu Qiye, Zhou Hao, Yu RuoXiFirst submitted to arxiv…
Memory-Efficient 4-bit Preconditioned Stochastic Optimizationby Jingyang Li, Kuangyu Ding, Kim-Chuan Toh, Pan ZhouFirst submitted to…
Doubly-Bounded Queue for Constrained Online Learning: Keeping Pace with Dynamics of Both Loss and Constraintby…
Explainable Fuzzy Neural Network with Multi-Fidelity Reinforcement Learning for Micro-Architecture Design Space Explorationby Hanwei Fan,…
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generationby Runtao Liu, Chen I Chieh, Jindong Gu, Jipeng…