Paper List
We recommend you use the search box as this list is very long.
-
Summary of Steering Away From Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks, by Han Wang et al.
-
Summary of Emotivetalk: Expressive Talking Head Generation Through Audio Information Decoupling and Emotional Video Diffusion, by Haotian Wang et al.
-
Summary of Chemsafetybench: Benchmarking Llm Safety on Chemistry Domain, by Haochen Zhao et al.
-
Summary of Gradient-guided Parameter Mask For Multi-scenario Image Restoration Under Adverse Weather, by Jilong Guo et al.
-
Summary of Document Haystacks: Vision-language Reasoning Over Piles Of 1000+ Documents, by Jun Chen et al.
-
Summary of Followgen: a Scaled Noise Conditional Diffusion Model For Car-following Trajectory Prediction, by Junwei You et al.
-
Summary of Is ‘right’ Right? Enhancing Object Orientation Understanding in Multimodal Language Models Through Egocentric Instruction Tuning, by Ji Hyeok Jung et al.
-
Summary of Background-aware Defect Generation For Robust Industrial Anomaly Detection, by Youngjae Cho et al.
-
Summary of Gemex: a Large-scale, Groundable, and Explainable Medical Vqa Benchmark For Chest X-ray Diagnosis, by Bo Liu et al.
-
Summary of Unipose: a Unified Multimodal Framework For Human Pose Comprehension, Generation and Editing, by Yiheng Li et al.
-
Summary of What Can Llm Tell Us About Cities?, by Zhuoheng Li et al.
-
Summary of Magic-slam: Multi-agent Gaussian Globally Consistent Slam, by Vladimir Yugay et al.
-
Summary of Enhancing Answer Reliability Through Inter-model Consensus Of Large Language Models, by Alireza Amiri-margavi et al.
-
Summary of Human Motion Instruction Tuning, by Lei Li and Sen Jia and Wang Jianhao and Zhongyu Jiang and Feng Zhou and Ju Dai and Tianfang Zhang and Wu Zongkai and Jenq-neng Hwang
-
Summary of Fine-tuning Llms with Noisy Data For Political Argument Generation and Post Guidance, by Svetlana Churina et al.
-
Summary of Local and Global Feature Attention Fusion Network For Face Recognition, by Wang Yu et al.
-
Summary of End-to-end Steering For Autonomous Vehicles Via Conditional Imitation Co-learning, by Mahmoud M. Kishky et al.
-
Summary of Enhancing Multi-agent Consensus Through Third-party Llm Integration: Analyzing Uncertainty and Mitigating Hallucinations in Large Language Models, by Zhihua Duan et al.
-
Summary of Salova: Segment-augmented Long Video Assistant For Targeted Retrieval and Routing in Long-form Video Analysis, by Junho Kim et al.
-
Summary of Probing For Consciousness in Machines, by Mathis Immertreu et al.
-
Summary of Diagnosis Of Diabetic Retinopathy Using Machine Learning & Deep Learning Technique, by Eric Shah et al.
-
Summary of Bayling 2: a Multilingual Large Language Model with Efficient Language Alignment, by Shaolei Zhang et al.
-
Summary of Brain-like Emergent Properties in Deep Networks: Impact Of Network Architecture, Datasets and Training, by Niranjan Rajesh et al.
-
Summary of One Diffusion to Generate Them All, by Duong H. Le et al.
-
Summary of Adapter-based Approaches to Knowledge-enhanced Language Models — a Survey, by Alexander Fichtl et al.
-
Summary of Synthesising Handwritten Music with Gans: a Comprehensive Evaluation Of Cyclewgan, Progan, and Dcgan, by Elona Shatri et al.
-
Summary of A Study on Unsupervised Domain Adaptation For Semantic Segmentation in the Era Of Vision-language Models, by Manuel Schwonberg et al.
-
Summary of Topv-nav: Unlocking the Top-view Spatial Reasoning Potential Of Mllm For Zero-shot Object Navigation, by Linqing Zhong et al.
-
Summary of When Babies Teach Babies: Can Student Knowledge Sharing Outperform Teacher-guided Distillation on Small Datasets?, by Srikrishna Iyer
-
Summary of Robospatial: Teaching Spatial Understanding to 2d and 3d Vision-language Models For Robotics, by Chan Hee Song et al.
-
Summary of O1 Replication Journey — Part 2: Surpassing O1-preview Through Simple Distillation, Big Progress or Bitter Lesson?, by Zhen Huang et al.
-
Summary of From Generation to Judgment: Opportunities and Challenges Of Llm-as-a-judge, by Dawei Li et al.
-
Summary of F — a Model Of Events Based on the Foundational Ontology Dolce+dns Ultralite, by Ansgar Scherp et al.
-
Summary of Imperceptible Adversarial Examples in the Physical World, by Weilin Xu et al.
-
Summary of “all That Glitters”: Approaches to Evaluations with Unreliable Model and Human Annotations, by Michael Hardy
-
Summary of Ontology-constrained Generation Of Domain-specific Clinical Summaries, by Gaya Mehenni and Amal Zouaq
-
Summary of Ramie: Retrieval-augmented Multi-task Information Extraction with Large Language Models on Dietary Supplements, by Zaifu Zhan et al.
-
Summary of Ltcf-net: a Transformer-enhanced Dual-channel Fourier Framework For Low-light Image Restoration, by Gaojing Zhang and Jinglun Feng
-
Summary of Fasttracktr:towards Fast Multi-object Tracking with Transformers, by Pan Liao et al.
-
Summary of Peng: Pose-enhanced Geo-localisation, by Tavis Shore et al.
-
Summary of Decoding Urban Industrial Complexity: Enhancing Knowledge-driven Insights Via Industryscopegpt, by Siqi Wang et al.
-
Summary of Do Llms Really Think Step-by-step in Implicit Reasoning?, by Yijiong Yu
-
Summary of Creating Scalable Agi: the Open General Intelligence Framework, by Daniel A. Dollinger et al.
-
Summary of Highly Efficient and Unsupervised Framework For Moving Object Detection in Satellite Videos, by C. Xiao et al.
-
Summary of Generative Prompt Internalization, by Haebin Shin et al.
-
Summary of Deep Learning For Automated Multi-scale Functional Field Boundaries Extraction Using Multi-date Sentinel-2 and Planetscope Imagery: Case Study Of Netherlands and Pakistan, by Saba Zahid et al.
-
Summary of Drive: Dual-robustness Via Information Variability and Entropic Consistency in Source-free Unsupervised Domain Adaptation, by Ruiqiang Xiao et al.
-
Summary of From Dashcam Videos to Driving Simulations: Stress Testing Automated Vehicles Against Rare Events, by Yan Miao et al.
-
Summary of Unitedvln: Generalizable Gaussian Splatting For Continuous Vision-language Navigation, by Guangzhao Dai et al.
-
Summary of Enclip: Ensembling and Clustering-based Contrastive Language-image Pretraining For Fashion Multimodal Search with Limited Data and Low-quality Images, by Prithviraj Purushottam Naik et al.
-
Summary of Debiasing Classifiers by Amplifying Bias with Latent Diffusion and Large Language Models, By Donggeun Ko et al.
-
Summary of Llm Augmentations to Support Analytical Reasoning Over Multiple Documents, by Raquib Bin Yousuf et al.
-
Summary of Med-persam: One-shot Visual Prompt Tuning For Personalized Segment Anything Model in Medical Domain, by Hangyul Yoon et al.
-
Summary of Cia: Controllable Image Augmentation Framework Based on Stable Diffusion, by Mohamed Benkedadra et al.
-
Summary of Regulator-manufacturer Ai Agents Modeling: Mathematical Feedback-driven Multi-agent Llm Framework, by Yu Han and Zekun Guo
-
Summary of Unigaussian: Driving Scene Reconstruction From Multiple Camera Models Via Unified Gaussian Representations, by Yuan Ren et al.
-
Summary of Designing Cellular Manufacturing System in Presence Of Alternative Process Plans, by Md. Kutub Uddin et al.
-
Summary of Exploiting Watermark-based Defense Mechanisms in Text-to-image Diffusion Models For Unauthorized Data Usage, by Soumil Datta et al.
-
Summary of Exploring Large Language Models For Multimodal Sentiment Analysis: Challenges, Benchmarks, and Future Directions, by Shezheng Song
-
Summary of Gradient-free Classifier Guidance For Diffusion Model Sampling, by Rahul Shenoy et al.
-
Summary of Freepruner: a Training-free Approach For Large Multimodal Model Acceleration, by Bingxin Xu et al.
-
Summary of Fg-cxr: a Radiologist-aligned Gaze Dataset For Enhancing Interpretability in Chest X-ray Report Generation, by Trong Thang Pham et al.
-
Summary of Kinmo: Kinematic-aware Human Motion Understanding and Generation, by Pengfei Zhang et al.
-
Summary of Enhancing Instruction-following Capability Of Visual-language Models by Reducing Image Redundancy, By Te Yang et al.
-
Summary of Automatic Evaluation For Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark, by Rong-cheng Tu et al.
-
Summary of Interactive Visual Assessment For Text-to-image Generation Models, by Xiaoyue Mi et al.
-
Summary of Enhancing Grammatical Error Detection Using Bert with Cleaned Lang-8 Dataset, by Rahul Nihalani et al.
-
Summary of Large Language Model with Region-guided Referring and Grounding For Ct Report Generation, by Zhixuan Chen et al.
-
Summary of Rewind: Understanding Long Videos with Instructed Learnable Memory, by Anxhelo Diko et al.
-
Summary of Do Llms Agree on the Creativity Evaluation Of Alternative Uses?, by Abdullah Al Rabeyah et al.
-
Summary of A Survey on Llm-as-a-judge, by Jiawei Gu et al.
-
Summary of An Adversarial Feature Learning Based Semantic Communication Method For Human 3d Reconstruction, by Shaojiang Liu et al.
-
Summary of How Texts Help? a Fine-grained Evaluation to Reveal the Role Of Language in Vision-language Tracking, by Xuchen Li et al.
-
Summary of Aligning Generalisation Between Humans and Machines, by Filip Ilievski et al.
-
Summary of Llm For Barcodes: Generating Diverse Synthetic Data For Identity Documents, by Hitesh Laxmichand Patel et al.
-
Summary of Swissadt: An Audio Description Translation System For Swiss Languages, by Lukas Fischer et al.
-
Summary of Empowering Clients: Transformation Of Design Processes Due to Generative Ai, by Johannes Schneider et al.
-
Summary of Scribeagent: Towards Specialized Web Agents Using Production-scale Workflow Data, by Junhong Shen et al.
-
Summary of Videorepair: Improving Text-to-video Generation Via Misalignment Evaluation and Localized Refinement, by Daeun Lee et al.
-
Summary of Xgrammar: Flexible and Efficient Structured Generation Engine For Large Language Models, by Yixin Dong et al.
-
Summary of Rexrank: a Public Leaderboard For Ai-powered Radiology Report Generation, by Xiaoman Zhang et al.
-
Summary of Measuring Bullshit in the Language Games Played by Chatgpt, By Alessandro Trevisan et al.
-
Summary of Gradient-weighted Feature Back-projection: a Fast Alternative to Feature Distillation in 3d Gaussian Splatting, by Joji Joseph et al.
-
Summary of Toxilab: How Well Do Open-source Llms Generate Synthetic Toxicity Data?, by Zheng Hui et al.
-
Summary of Adversarial Prompt Distillation For Vision-language Models, by Lin Luo et al.
-
Summary of Beyond Visual Understanding: Introducing Parrot-360v For Vision Language Model Benchmarking, by Harsha Vardhan Khurdula et al.
-
Summary of Locref-diffusion:tuning-free Layout and Appearance-guided Generation, by Fan Deng et al.
-
Summary of Vivid-10m: a Dataset and Baseline For Versatile and Interactive Video Local Editing, by Jiahao Hu et al.
-
Summary of Eadreg: Probabilistic Correspondence Generation with Efficient Autoregressive Diffusion Model For Outdoor Point Cloud Registration, by Linrui Gong et al.
-
Summary of Ai-driven Real-time Monitoring Of Ground-nesting Birds: a Case Study on Curlew Detection Using Yolov10, by Carl Chalmers et al.
-
Summary of Event Uskt : U-state Space Model in Knowledge Transfer For Event Cameras, by Yuhui Lin et al.
-
Summary of Sycophancy in Large Language Models: Causes and Mitigations, by Lars Malmqvist
-
Summary of Mme-survey: a Comprehensive Survey on Evaluation Of Multimodal Llms, by Chaoyou Fu et al.
-
Summary of Pplqa: An Unsupervised Information-theoretic Quality Metric For Comparing Generative Large Language Models, by Gerald Friedland et al.
-
Summary of Robust Planning with Compound Llm Architectures: An Llm-modulo Approach, by Atharva Gundawar et al.
-
Summary of Mediating Modes Of Thought: Llm’s For Design Scripting, by Moritz Rietschel et al.
-
Summary of The Impossible Test: a 2024 Unsolvable Dataset and a Chance For An Agi Quiz, by David Noever et al.
-
Summary of Ensuring Safety and Trust: Analyzing the Risks Of Large Language Models in Medicine, by Yifan Yang et al.
-
Summary of Star-agents: Automatic Data Optimization with Llm Agents For Instruction Tuning, by Hang Zhou et al.