Summary of Uncovering the Text Embedding in Text-to-image Diffusion Models, by Hu Yu et al.
Uncovering the Text Embedding in Text-to-Image Diffusion Modelsby Hu Yu, Hao Luo, Fan Wang, Feng…
Uncovering the Text Embedding in Text-to-Image Diffusion Modelsby Hu Yu, Hao Luo, Fan Wang, Feng…
Direct Preference Optimization of Video Large Multimodal Models from Language Model Rewardby Ruohong Zhang, Liangke…
FABLES: Evaluating faithfulness and content selection in book-length summarizationby Yekyung Kim, Yapei Chang, Marzena Karpinska,…
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representationsby Deqing Fu, Ruohao Guo, Ghazal Khalighinejad, Ollie…
A Review of Multi-Modal Large Language and Vision Modelsby Kilian Carolan, Laura Fennelly, Alan F.…
Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Modelsby Yi-Lin Tuan, Xilun Chen,…
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Modelby Musashi Hinck, Matthew L. Olson,…
FineFake: A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detectionby Ziyi Zhou, Xiaoming Zhang, Litian…
Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generationby Rohan Chaudhury, Mihir Godbole, Aakash Garg,…
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Modelby Lirui Zhao, Yue Yang,…