Summary of Semi-supervised Reward Modeling Via Iterative Self-training, by Yifei He et al.
Semi-Supervised Reward Modeling via Iterative Self-Trainingby Yifei He, Haoxiang Wang, Ziyan Jiang, Alexandros Papangelis, Han…
Semi-Supervised Reward Modeling via Iterative Self-Trainingby Yifei He, Haoxiang Wang, Ziyan Jiang, Alexandros Papangelis, Han…
Geometric-Averaged Preference Optimization for Soft Preference Labelsby Hiroki Furuta, Kuang-Huei Lee, Shixiang Shane Gu, Yutaka…
Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networksby Teresa Dorszewski, Lenka Tětková, Lorenz…
Alt-MoE:A Scalable Framework for Bidirectional Multimodal Alignment and Efficient Knowledge Integrationby Hongyang Lei, Xiaolong Cheng,…
Forward KL Regularized Preference Optimization for Aligning Diffusion Policiesby Zhao Shan, Chenyou Fan, Shuang Qiu,…
Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Smallby Maheep Chaudhary, Atticus GeigerFirst…
TSO: Self-Training with Scaled Preference Optimizationby Kaihui Chen, Hao Yi, Qingyang Li, Tianyu Qi, Yulan…
Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignmentby Konstantin Schall, Kai Uwe Barthel,…
Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasetsby Ishan Rajendrakumar Dave, Fabian Caba…
FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognitionby Ishan Rajendrakumar Dave, Mamshad Nayeem…