Summary of Uft: Unifying Fine-tuning Of Sft and Rlhf/dpo/una Through a Generalized Implicit Reward Function, by Zhichao Wang et al.
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Functionby Zhichao Wang,…
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Functionby Zhichao Wang,…
BongLLaMA: LLaMA for Bangla Languageby Abdullah Khan Zehady, Safi Al Mamun, Naymul Islam, Santu KarmakerFirst…
Layer by Layer: Uncovering Where Multi-Task Learning Happens in Instruction-Tuned Large Language Modelsby Zheng Zhao,…
Influential Language Data Selection via Gradient Trajectory Pursuitby Zhiwei Deng, Tao Li, Yang LiFirst submitted…
LLMOPT: Learning to Define and Solve General Optimization Problems from Scratchby Caigao Jiang, Xiang Shu,…
Data Quality Control in Federated Instruction-tuning of Large Language Modelsby Yaxin Du, Rui Ye, Fengting…
TSDS: Data Selection for Task-Specific Model Finetuningby Zifan Liu, Amin Karbasi, Theodoros RekatsinasFirst submitted to…
Federated Data-Efficient Instruction Tuning for Large Language Modelsby Zhen Qin, Zhaomin Wu, Bingsheng He, Shuiguang…
Large Continual Instruction Assistantby Jingyang Qiao, Zhizhong Zhang, Xin Tan, Yanyun Qu, Shouhong Ding, Yuan…
Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Relianceby Sachin Goyal, Christina Baek,…