Summary of Grounded Compositional and Diverse Text-to-3d with Pretrained Multi-view Diffusion Model, by Xiaolong Li et al.
Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Modelby Xiaolong Li, Jiawei Mo, Ying…
Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Modelby Xiaolong Li, Jiawei Mo, Ying…
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preservingby Jiehui Huang, Xiao Dong, Wenhui Song, Zheng…
A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillationby…
Raformer: Redundancy-Aware Transformer for Video Wire Inpaintingby Zhong Ji, Yimu Su, Yan Zhang, Jiacheng Hou,…
MaGGIe: Masked Guided Gradual Human Instance Mattingby Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava, Joon-Young…
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learningby Weifeng Chen, Jiacheng Zhang, Jie Wu,…
PriorNet: A Novel Lightweight Network with Multidimensional Interactive Attention for Efficient Image Dehazingby Yutong Chen,…
Ada-DF: An Adaptive Label Distribution Fusion Network For Facial Expression Recognitionby Shu Liu, Yan Xu,…
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Visionby Ankit Vani, Bac Nguyen,…
A review of deep learning-based information fusion techniques for multimodal medical image classificationby Yihao Li,…