Summary of A Bounding Box Is Worth One Token: Interleaving Layout and Text in a Large Language Model For Document Understanding, by Jinghui Lu et al.
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language…
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language…
Artemis: Towards Referential Understanding in Complex Videosby Jihao Qiu, Yuan Zhang, Xi Tang, Lingxi Xie,…
AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generationby…
Towards Two-Stream Foveation-based Active Vision Learningby Timur Ibrayev, Amitangshu Mukherjee, Sai Aparna Aketi, Kaushik RoyFirst…
Neural Slot Interpreters: Grounding Object Semantics in Emergent Slot Representationsby Bhishma Dedhia, Niraj K. JhaFirst…
Theoretically Achieving Continuous Representation of Oriented Bounding Boxesby Zi-Kai Xiao, Guo-Ye Yang, Xue Yang, Tai-Jiang…
Jacquard V2: Refining Datasets using the Human In the Loop Data Correction Methodby Qiuhao Li,…
Boximator: Generating Rich and Controllable Motions for Video Synthesisby Jiawei Wang, Yuchen Zhang, Jiaxin Zou,…
Improving the Detection of Small Oriented Objects in Aerial Imagesby Chandler Timm C. Doloriel, Rhandley…
Improving Generalization Performance of YOLOv8 for Camera Trap Object Detectionby Aroj SubediFirst submitted to arxiv…