Summary of Sam-e: Leveraging Visual Foundation Model with Sequence Imitation For Embodied Manipulation, by Junjie Zhang et al.
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulationby Junjie Zhang, Chenjia Bai,…
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulationby Junjie Zhang, Chenjia Bai,…
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesisby Basile Van Hoorick, Rundi Wu, Ege…
Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Controlby Gunshi Gupta, Karmesh Yadav, Yarin…
Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Drivingby Lingdong Kong, Xiang Xu, Jiawei Ren, Wenwei…
Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAMby Navid Rajabi, Jana KoseckaFirst submitted to…
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fieldsby Muhammad Zubair Irshad,…
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Databy Hanrong Ye, Dan XuFirst submitted…
Instance-Warp: Saliency Guided Image Warping for Unsupervised Domain Adaptationby Shen Zheng, Anurag Ghosh, Srinivasa G.…
Learning 3D object-centric representation through predictionby John Day, Tushar Arora, Jirui Liu, Li Erran Li,…
OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understandingby Francis Engelmann, Ayca Takmaz, Jonas Schult,…