Summary of Zyda: a 1.3t Dataset For Open Language Modeling, by Yury Tokpanov et al.
Zyda: A 1.3T Dataset for Open Language Modelingby Yury Tokpanov, Beren Millidge, Paolo Glorioso, Jonathan…
Zyda: A 1.3T Dataset for Open Language Modelingby Yury Tokpanov, Beren Millidge, Paolo Glorioso, Jonathan…
Multimodal Reasoning with Multimodal Knowledge Graphby Junlin Lee, Yequan Wang, Jing Li, Min ZhangFirst submitted…
Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUsby Vitor…
MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilizationby Yu Zhang, Qi Zhang, Zixuan Gong,…
LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Modelsby Liang Zhao,…
Jina CLIP: Your CLIP Model Is Also Your Text Retrieverby Andreas Koukounas, Georgios Mastrapas, Michael…
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choiceby Jian-Qiao Zhu, Haijiang…
JADS: A Framework for Self-supervised Joint Aspect Discovery and Summarizationby Xiaobo Guo, Jay Desai, Srinivasan…
Learning Shared RGB-D Fields: Unified Self-supervised Pre-training for Label-efficient LiDAR-Camera 3D Perceptionby Xiaohao Xu, Ye…
TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Predictionby Yinda Chen, Haoyuan Shi, Xiaoyu Liu,…