Summary of Recurrentgemma: Moving Past Transformers For Efficient Open Language Models, by Aleksandar Botev et al.
RecurrentGemma: Moving Past Transformers for Efficient Open Language Modelsby Aleksandar Botev, Soham De, Samuel L…
RecurrentGemma: Moving Past Transformers for Efficient Open Language Modelsby Aleksandar Botev, Soham De, Samuel L…
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategiesby Shengding Hu, Yuge…
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Lawsby Zeyuan Allen-Zhu, Yuanzhi LiFirst submitted…
Plug and Play with Prompts: A Prompt Tuning Approach for Controlling Text Generationby Rohan Deepak…
What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data…
Investigating Regularization of Self-Play Language Modelsby Reda Alami, Abdalgader Abubaker, Mastane Achab, Mohamed El Amine…
Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?by Ilya Ilyankou, Aldo Lipani, Stefano Cavazzi,…
Advancing LLM Reasoning Generalists with Preference Treesby Lifan Yuan, Ganqu Cui, Hanbin Wang, Ning Ding,…
Asymptotics of Language Model Alignmentby Joy Qiping Yang, Salman Salamatian, Ziteng Sun, Ananda Theertha Suresh,…
Extensive Self-Contrast Enables Feedback-Free Language Model Alignmentby Xiao Liu, Xixuan Song, Yuxiao Dong, Jie TangFirst…