Summary of The Dark Side Of Human Feedback: Poisoning Large Language Models Via User Inputs, by Bocheng Chen et al.
The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputsby Bocheng Chen,…
The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputsby Bocheng Chen,…
Chatting Up Attachment: Using LLMs to Predict Adult Bondsby Paulo Soares, Sean McCurdy, Andrew J.…
Does Alignment Tuning Really Break LLMs’ Internal Confidence?by Hongseok Oh, Wonseok HwangFirst submitted to arxiv…
Non-instructional Fine-tuning: Enabling Instruction-Following Capabilities in Pre-trained Language Models without Instruction-Following Databy Juncheng Xie, Shensian…
Iterative Graph Alignmentby Fangyuan Yu, Hardeep Singh Arora, Matt JohnsonFirst submitted to arxiv on: 29…
Learning Harmonized Representations for Speculative Samplingby Lefan Zhang, Xiaodan Wang, Yanhua Huang, Ruiwen XuFirst submitted…
Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillationby Lujun Gui, Bin Xiao,…
UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Functionby Zhichao…
SurGen: Text-Guided Diffusion Model for Surgical Video Generationby Joseph Cho, Samuel Schmidgall, Cyril Zakka, Mrudang…
Selective Preference Optimization via Token-Level Reward Function Estimationby Kailai Yang, Zhiwei Liu, Qianqian Xie, Jimin…