Summary of Enat: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis, by Zanlin Ni et al.
ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesisby Zanlin Ni, Yulin Wang, Renping Zhou, Yizeng…
ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesisby Zanlin Ni, Yulin Wang, Renping Zhou, Yizeng…
Conditional [MASK] Discrete Diffusion Language Modelby Hyukhun Koh, Minha Jhang, Dohyung Kim, Sangmook Lee, Kyomin…
LARP: Tokenizing Videos with a Learned Autoregressive Generative Priorby Hanyu Wang, Saksham Suri, Yixuan Ren,…
Non-myopic Generation of Language Models for Reasoning and Planningby Chang Ma, Haiteng Zhao, Junlei Zhang,…
Uncovering Autoregressive LLM Knowledge of Thematic Fit in Event Representationby Safeyah Khaled Alshemali, Daniel Bauer,…
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generationby Chengyue Wu, Xiaokang Chen, Zhiyu…
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generationby Hanbo Cheng,…
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspectiveby Yongxin Zhu, Bocheng Li,…
Will LLMs Replace the Encoder-Only Models in Temporal Relation Classification?by Gabriel Roccabruna, Massimo Rizzoli, Giuseppe…
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Modelsby Jun Wang, Meng…