Summary of Videoagent: Long-form Video Understanding with Large Language Model As Agent, by Xiaohan Wang et al.
VideoAgent: Long-form Video Understanding with Large Language Model as Agentby Xiaohan Wang, Yuhui Zhang, Orr…
VideoAgent: Long-form Video Understanding with Large Language Model as Agentby Xiaohan Wang, Yuhui Zhang, Orr…
PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL with Cross-consistencyby Zhishuai Li, Xiang Wang, Jingjing Zhao,…
3D-VLA: A 3D Vision-Language-Action Generative World Modelby Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang,…
Masked Generative Story Transformer with Character Guidance and Caption Augmentationby Christos Papadimitriou, Giorgos Filandrianos, Maria…
Large, Small or Both: A Novel Data Augmentation Framework Based on Language Models for Debiasing…
ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine Translationby Shaojie Dai, Xin Liu, Ping Luo, Yue…
Exploring Large Language Models and Hierarchical Frameworks for Classification of Large Unstructured Legal Documentsby Nishchal…
Can LLM Substitute Human Labeling? A Case Study of Fine-grained Chinese Address Entity Recognition Dataset…
TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decisionby Ruiwen Zhou, Yingxuan Yang,…
Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agentsby Jinyang Li, Nan Huo, Yan…