Summary of Scaling Laws Across Model Architectures: a Comparative Analysis Of Dense and Moe Models in Large Language Models, by Siqi Wang et al.
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large…
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large…
Aiding Global Convergence in Federated Learning via Local Perturbation and Mutual Similarity Informationby Emanuel Buttaci,…
On the Impacts of the Random Initialization in the Neural Tangent Kernel Theoryby Guhan Chen,…
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipeby Yuxin Xiao, Shujian Zhang, Wenxuan Zhou,…
SePPO: Semi-Policy Preference Optimization for Diffusion Alignmentby Daoan Zhang, Guangchen Lan, Dong-Jun Han, Wenlin Yao,…
Next state prediction gives rise to entangled, yet compositional representations of objectsby Tankred Saanum, Luca…
Failure-Proof Non-Contrastive Self-Supervised Learningby Emanuele Sansone, Tim Lebailly, Tinne TuytelaarsFirst submitted to arxiv on: 7…
Collaboration! Towards Robust Neural Methods for Routing Problemsby Jianan Zhou, Yaoxin Wu, Zhiguang Cao, Wen…
DEPT: Decoupled Embeddings for Pre-training Language Modelsby Alex Iacob, Lorenzo Sani, Meghdad Kurmanji, William F.…
DreamSat: Towards a General 3D Model for Novel View Synthesis of Space Objectsby Nidhi Mathihalli,…