Summary of Babyllama-2: Ensemble-distilled Models Consistently Outperform Teachers with Limited Data, by Jean-loup Tastet et al.
BabyLlama-2: Ensemble-Distilled Models Consistently Outperform Teachers With Limited Databy Jean-Loup Tastet, Inar TimiryasovFirst submitted to…