Summary of Multi-scale Temporal Difference Transformer For Video-text Retrieval, by Ni Wang et al.
Multi-Scale Temporal Difference Transformer for Video-Text Retrievalby Ni Wang, Dongliang Liao, Xing XuFirst submitted to…
Multi-Scale Temporal Difference Transformer for Video-Text Retrievalby Ni Wang, Dongliang Liao, Xing XuFirst submitted to…
Combining Supervised Learning and Reinforcement Learning for Multi-Label Classification Tasks with Partial Labelsby Zixia Jia,…
Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancementby Zhiyuan Chang, Mingyang Li, Junjie…
Video-Infinity: Distributed Long Video Generationby Zhenxiong Tan, Xingyi Yang, Songhua Liu, Xinchao WangFirst submitted to…
Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Otherby Yifei Gao, Jie Ou, Lei…
LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environmentsby Zixia Jia,…
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videosby Yuting Mei, Linli Yao, Qin…
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Mergingby Deyuan Liu, Zhanyue Qin,…
Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Modelsby…
Directed Domain Fine-Tuning: Tailoring Separate Modalities for Specific Training Tasksby Daniel Wen, Nafisa HussainFirst submitted…