Paper List
We recommend you use the search box as this list is very long.
-
Summary of A Local Information Aggregation Based Multi-agent Reinforcement Learning For Robot Swarm Dynamic Task Allocation, by Yang Lv et al.
-
Summary of Knowledge Management For Automobile Failure Analysis Using Graph Rag, by Yuta Ojima et al.
-
Summary of Training Agents with Weakly Supervised Feedback From Large Language Models, by Dihong Gong et al.
-
Summary of Great: Geometry-intention Collaborative Inference For Open-vocabulary 3d Object Affordance Grounding, by Yawen Shao et al.
-
Summary of Chinesewebtext 2.0: Large-scale High-quality Chinese Web Text with Multi-dimensional and Fine-grained Information, by Wanyue Zhang et al.
-
Summary of Pddlfuse: a Tool For Generating Diverse Planning Domains, by Vedant Khandelwal et al.
-
Summary of Handling Irresolvable Conflicts in the Semantic Web: An Rdf-based Conflict-tolerant Version Of the Deontic Traditional Scheme, by Livio Robaldo and Gianluca Pozzato
-
Summary of Sims: Simulating Stylized Human-scene Interactions with Retrieval-augmented Script Generation, by Wenjia Wang et al.
-
Summary of Planning Vs Reasoning: Ablations to Test Capabilities Of Lora Layers, by Neel Redkar
-
Summary of Improving Medical Diagnostics with Vision-language Models: Convex Hull-based Uncertainty Analysis, by Ferhat Ozgur Catak and Murat Kuzlu and Taylor Patrick
-
Summary of Mosabench: Multi-object Sentiment Analysis Benchmark For Evaluating Multimodal Large Language Models Understanding Of Complex Image, by Shezheng Song et al.
-
Summary of Diffguard: Text-based Safety Checker For Diffusion Models, by Massine El Khader et al.
-
Summary of Addressing Vulnerabilities in Ai-image Detection: Challenges and Proposed Solutions, by Justin Jiang
-
Summary of Graph Canvas For Controllable 3d Scene Generation, by Libin Liu and Shen Chen and Sen Jia and Jingzhe Shi and Zhongyu Jiang and Can Jin and Wu Zongkai and Jenq-neng Hwang and Lei Li
-
Summary of Scenetap: Scene-coherent Typographic Adversarial Planner Against Vision-language Models in Real-world Environments, by Yue Cao et al.
-
Summary of Dspy-based Neural-symbolic Pipeline to Enhance Spatial Reasoning in Llms, by Rong Wang et al.
-
Summary of Cross-modal Information Flow in Multimodal Large Language Models, by Zhi Zhang et al.
-
Summary of Scaleviz: Scaling Visualization Recommendation Models on Large Data, by Ghazi Shazan Ahmad et al.
-
Summary of Dhcp: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-language Models, By Yudong Zhang et al.
-
Summary of On the Effectiveness Of Incremental Training Of Large Language Models, by Miles Q. Li et al.
-
Summary of Generative Visual Communication in the Era Of Vision-language Models, by Yael Vinker
-
Summary of Gaussianspeech: Audio-driven Gaussian Avatars, by Shivangi Aneja et al.
-
Summary of The Performance Of the Lstm-based Code Generated by Large Language Models (llms) in Forecasting Time Series Data, By Saroj Gopali et al.
-
Summary of Covis: a Collaborative Framework For Fine-grained Graphic Visual Understanding, by Xiaoyu Deng et al.
-
Summary of Newsedits 2.0: Learning the Intentions Behind Updating News, by Alexander Spangher et al.
-
Summary of Devising a Set Of Compact and Explainable Spoken Language Feature For Screening Alzheimer’s Disease, by Junan Li et al.
-
Summary of Ezsql: An Sql Intermediate Representation For Improving Sql-to-text Generation, by Meher Bhardwaj et al.
-
Summary of Scratcheval: Are Gpt-4o Smarter Than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges, by Rao Fu et al.
-
Summary of Ustcctsu at Semeval-2024 Task 1: Reducing Anisotropy For Cross-lingual Semantic Textual Relatedness Task, by Jianjian Li et al.
-
Summary of Mars-po: Multi-agent Reasoning System Preference Optimization, by Xiaoxuan Lou et al.
-
Summary of Objectrelator: Enabling Cross-view Object Relation Understanding in Ego-centric and Exo-centric Videos, by Yuqian Fu et al.
-
Summary of Way to Specialist: Closing Loop Between Specialized Llm and Evolving Domain Knowledge Graph, by Yutong Zhang et al.
-
Summary of Msg Score: a Comprehensive Evaluation For Multi-scene Video Generation, by Daewon Yoon et al.
-
Summary of Hot3d: Hand and Object Tracking in 3d From Egocentric Multi-view Videos, by Prithviraj Banerjee et al.
-
Summary of Sowing Information: Cultivating Contextual Coherence with Mllms in Image Generation, by Yuhan Pei and Ruoyu Wang and Yongqi Yang and Ye Zhu and Olga Russakovsky and Yu Wu
-
Summary of Simulating Tabular Datasets Through Llms to Rapidly Explore Hypotheses About Real-world Entities, by Miguel Zabaleta et al.
-
Summary of Dumapper: Towards Automatic Verification Of Large-scale Pois with Street Views at Baidu Maps, by Miao Fan et al.
-
Summary of Monopoly: Learning to Price Public Facilities For Revaluing Private Properties with Large-scale Urban Data, by Miao Fan et al.
-
Summary of A Survey on Cutting-edge Relation Extraction Techniques Based on Language Models, by Jose A. Diaz-garcia and Julio Amador Diaz Lopez
-
Summary of Abductive Symbolic Solver on Abstraction and Reasoning Corpus, by Mintaek Lim et al.
-
Summary of From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects, by Zizhao Li et al.
-
Summary of Pdzseg: Adapting the Foundation Model For Dissection Zone Segmentation with Visual Prompts in Robot-assisted Endoscopic Submucosal Dissection, by Mengya Xu et al.
-
Summary of Timemarker: a Versatile Video-llm For Long and Short Video Understanding with Superior Temporal Localization Ability, by Shimin Chen et al.
-
Summary of Paths: a Hierarchical Transformer For Efficient Whole Slide Image Analysis, by Zak Buzzard et al.
-
Summary of Dependency-aware Cav Task Scheduling Via Diffusion-based Reinforcement Learning, by Xiang Cheng et al.
-
Summary of Thai Financial Domain Adaptation Of Thalle — Technical Report, by Kbtg Labs et al.
-
Summary of Large Language Model-brained Gui Agents: a Survey, by Chaoyun Zhang et al.
-
Summary of Mvketr: Chest Ct Report Generation with Multi-view Perception and Knowledge Enhancement, by Xiwei Deng et al.
-
Summary of Continual Learning in Machine Speech Chain Using Gradient Episodic Memory, by Geoffrey Tyndall et al.
-
Summary of Helvipad: a Real-world Dataset For Omnidirectional Stereo Depth Estimation, by Mehdi Zayene et al.
-
Summary of Is My Meeting Summary Good? Estimating Quality with a Multi-llm Evaluator, by Frederic Kirstein et al.
-
Summary of Gpt As Ghostwriter at the White House, by Jacques Savoy
-
Summary of Tryoffdiff: Virtual-try-off Via High-fidelity Garment Reconstruction Using Diffusion Models, by Riza Velioglu et al.
-
Summary of Draft Model Knows When to Stop: a Self-verification Length Policy For Speculative Decoding, by Ziyin Zhang and Jiahao Xu and Tian Liang and Xingyu Chen and Zhiwei He and Rui Wang and Zhaopeng Tu
-
Summary of Weakly Supervised Framework Considering Multi-temporal Information For Large-scale Cropland Mapping with Satellite Imagery, by Yuze Wang et al.
-
Summary of Bpp-search: Enhancing Tree Of Thought Reasoning For Mathematical Modeling Problem Solving, by Teng Wang et al.
-
Summary of Advancing Uncertain Combinatorics Through Graphization, Hyperization, and Uncertainization: Fuzzy, Neutrosophic, Soft, Rough, and Beyond, by Takaaki Fujita
-
Summary of Spatially Visual Perception For End-to-end Robotic Learning, by Travis Davies et al.
-
Summary of Wf-vae: Enhancing Video Vae by Wavelet-driven Energy Flow For Latent Video Diffusion Model, By Zongjian Li and Bin Lin and Yang Ye and Liuhan Chen and Xinhua Cheng and Shenghai Yuan and Li Yuan
-
Summary of Showui: One Vision-language-action Model For Gui Visual Agent, by Kevin Qinghong Lin et al.
-
Summary of A Bilayer Segmentation-recombination Network For Accurate Segmentation Of Overlapping C. Elegans, by Mengqian Dinga et al.
-
Summary of What’s in the Image? a Deep-dive Into the Vision Of Vision Language Models, by Omri Kaduri et al.
-
Summary of Stableanimator: High-quality Identity-preserving Human Image Animation, by Shuyuan Tu et al.
-
Summary of Uvcg: Leveraging Temporal Consistency For Universal Video Protection, by Kaizhou Li et al.
-
Summary of Mvboost: Boost 3d Reconstruction with Multi-view Refinement, by Xiangyu Liu et al.
-
Summary of Self-supervised Monocular Depth and Pose Estimation For Endoscopy with Generative Latent Priors, by Ziang Xu et al.
-
Summary of Svgdreamer++: Advancing Editability and Diversity in Text-guided Svg Generation, by Ximing Xing et al.
-
Summary of Arabic-nougat: Fine-tuning Vision Transformers For Arabic Ocr and Markdown Extraction, by Mohamed Rashad
-
Summary of Hoppr Medical-grade Platform For Medical Imaging Ai, by Kalina P. Slavkova et al.
-
Summary of Evaluating Generative Ai-enhanced Content: a Conceptual Framework Using Qualitative, Quantitative, and Mixed-methods Approaches, by Saman Sarraf
-
Summary of Can Llms Plan Paths in the Real World?, by Wanyi Chen et al.
-
Summary of A Novel Pareto-optimal Ranking Method For Comparing Multi-objective Optimization Algorithms, by Amin Ibrahim et al.
-
Summary of An End-to-end Two-stream Network Based on Rgb Flow and Representation Flow For Human Action Recognition, by Song-jiang Lai et al.
-
Summary of Vlm-hoi: Vision Language Models For Interpretable Human-object Interaction Analysis, by Donggoo Kang et al.
-
Summary of Personacraft: Personalized and Controllable Full-body Multi-human Scene Generation Using Occlusion-aware 3d-conditioned Diffusion, by Gwanghyun Kim et al.
-
Summary of Augmenting Multimodal Llms with Self-reflective Tokens For Knowledge-based Visual Question Answering, by Federico Cocchi et al.
-
Summary of Boundless Socratic Learning with Language Games, by Tom Schaul
-
Summary of Harnessing Llms For Educational Content-driven Italian Crossword Generation, by Kamyar Zeinalipour et al.
-
Summary of Teaching Smaller Language Models to Generalise to Unseen Compositional Questions (full Thesis), by Tim Hartill
-
Summary of G3d-lf: Generalizable 3d-language Feature Fields For Embodied Tasks, by Zihan Wang et al.
-
Summary of Path-rag: Knowledge-guided Key Region Retrieval For Open-ended Pathology Visual Question Answering, by Awais Naeem et al.
-
Summary of Advancing Content Moderation: Evaluating Large Language Models For Detecting Sensitive Content Across Text, Images, and Videos, by Nouar Aldahoul et al.
-
Summary of Doge: Towards Versatile Visual Document Grounding and Referring, by Yinan Zhou et al.
-
Summary of Llm-based Offline Learning For Embodied Agents Via Consistency-guided Reward Ensemble, by Yujeong Lee et al.
-
Summary of Chatgen: Automatic Text-to-image Generation From Freestyle Chatting, by Chengyou Jia et al.
-
Summary of Learning Monotonic Attention in Transducer For Streaming Generation, by Zhengrui Ma et al.
-
Summary of Strategic Prompting For Conversational Tasks: a Comparative Analysis Of Large Language Models Across Diverse Conversational Tasks, by Ratnesh Kumar Joshi et al.
-
Summary of Buffer Anytime: Zero-shot Video Depth and Normal From Image Priors, by Zhengfei Kuang et al.
-
Summary of Semantic Data Augmentation For Long-tailed Facial Expression Recognition, by Zijian Li et al.
-
Summary of Heie: Mllm-based Hierarchical Explainable Aigc Image Implausibility Evaluator, by Fan Yang et al.
-
Summary of Refine: a Reward-based Framework For Interpretable and Nuanced Evaluation Of Radiology Report Generation, by Yunyi Liu et al.
-
Summary of Towards Intention Recognition For Robotic Assistants Through Online Pomdp Planning, by Juan Carlos Saborio and Joachim Hertzberg
-
Summary of Different Bias Under Different Criteria: Assessing Bias in Llms with a Fact-based Approach, by Changgeon Ko et al.
-
Summary of Can Llms Be Good Graph Judger For Knowledge Graph Construction?, by Haoyu Huang et al.
-
Summary of Fairness and Performance in Harmony: Data Debiasing Is All You Need, by Junhua Liu and Wendy Wan Yee Hui and Roy Ka-wei Lee and Kwan Hui Lim
-
Summary of Do Automatic Factuality Metrics Measure Factuality? a Critical Evaluation, by Sanjana Ramprasad et al.
-
Summary of Dreamrunner: Fine-grained Compositional Story-to-video Generation with Retrieval-augmented Motion Adaptation, by Zun Wang et al.
-
Summary of Enhancing Llms For Power System Simulations: a Feedback-driven Multi-agent Framework, by Mengshuo Jia et al.
-
Summary of A Brief Summary Of Explanatory Virtues, by Ingrid Zukerman
-
Summary of Neuro-symbolic Evaluation Of Text-to-video Models Using Formal Verification, by S. P. Sharan et al.