Summary of Zyda: a 1.3t Dataset For Open Language Modeling, by Yury Tokpanov et al.
Zyda: A 1.3T Dataset for Open Language Modelingby Yury Tokpanov, Beren Millidge, Paolo Glorioso, Jonathan…
Zyda: A 1.3T Dataset for Open Language Modelingby Yury Tokpanov, Beren Millidge, Paolo Glorioso, Jonathan…
Multimodal Reasoning with Multimodal Knowledge Graphby Junlin Lee, Yequan Wang, Jing Li, Min ZhangFirst submitted…
Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUsby Vitor…
MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilizationby Yu Zhang, Qi Zhang, Zixuan Gong,…
LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Modelsby Liang Zhao,…
Jina CLIP: Your CLIP Model Is Also Your Text Retrieverby Andreas Koukounas, Georgios Mastrapas, Michael…
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choiceby Jian-Qiao Zhu, Haijiang…
JADS: A Framework for Self-supervised Joint Aspect Discovery and Summarizationby Xiaobo Guo, Jay Desai, Srinivasan…
Learning Shared RGB-D Fields: Unified Self-supervised Pre-training for Label-efficient LiDAR-Camera 3D Perceptionby Xiaohao Xu, Ye…
A Survey of Multimodal Large Language Model from A Data-centric Perspectiveby Tianyi Bai, Hao Liang,…