Paper List

We recommend you use the search box as this list is very long.

Summary of Llm-based Offline Learning For Embodied Agents Via Consistency-guided Reward Ensemble, by Yujeong Lee et al.
Summary of Chatgen: Automatic Text-to-image Generation From Freestyle Chatting, by Chengyou Jia et al.
Summary of Strategic Prompting For Conversational Tasks: a Comparative Analysis Of Large Language Models Across Diverse Conversational Tasks, by Ratnesh Kumar Joshi et al.
Summary of A Study on Unsupervised Domain Adaptation For Semantic Segmentation in the Era Of Vision-language Models, by Manuel Schwonberg et al.
Summary of Topv-nav: Unlocking the Top-view Spatial Reasoning Potential Of Mllm For Zero-shot Object Navigation, by Linqing Zhong et al.
Summary of When Babies Teach Babies: Can Student Knowledge Sharing Outperform Teacher-guided Distillation on Small Datasets?, by Srikrishna Iyer
Summary of O1 Replication Journey — Part 2: Surpassing O1-preview Through Simple Distillation, Big Progress or Bitter Lesson?, by Zhen Huang et al.
Summary of Robospatial: Teaching Spatial Understanding to 2d and 3d Vision-language Models For Robotics, by Chan Hee Song et al.
Summary of From Generation to Judgment: Opportunities and Challenges Of Llm-as-a-judge, by Dawei Li et al.
Summary of F — a Model Of Events Based on the Foundational Ontology Dolce+dns Ultralite, by Ansgar Scherp et al.
Summary of Imperceptible Adversarial Examples in the Physical World, by Weilin Xu et al.
Summary of Dreamrunner: Fine-grained Compositional Story-to-video Generation with Retrieval-augmented Motion Adaptation, by Zun Wang et al.
Summary of Do Automatic Factuality Metrics Measure Factuality? a Critical Evaluation, by Sanjana Ramprasad et al.
Summary of Enhancing Llms For Power System Simulations: a Feedback-driven Multi-agent Framework, by Mengshuo Jia et al.
Summary of A Brief Summary Of Explanatory Virtues, by Ingrid Zukerman
Summary of Neuro-symbolic Evaluation Of Text-to-video Models Using Formal Verification, by S. P. Sharan et al.
Summary of Emotivetalk: Expressive Talking Head Generation Through Audio Information Decoupling and Emotional Video Diffusion, by Haotian Wang et al.
Summary of Steering Away From Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks, by Han Wang et al.
Summary of Chemsafetybench: Benchmarking Llm Safety on Chemistry Domain, by Haochen Zhao et al.
Summary of Document Haystacks: Vision-language Reasoning Over Piles Of 1000+ Documents, by Jun Chen et al.
Summary of Gradient-guided Parameter Mask For Multi-scenario Image Restoration Under Adverse Weather, by Jilong Guo et al.
Summary of Followgen: a Scaled Noise Conditional Diffusion Model For Car-following Trajectory Prediction, by Junwei You et al.
Summary of Is ‘right’ Right? Enhancing Object Orientation Understanding in Multimodal Language Models Through Egocentric Instruction Tuning, by Ji Hyeok Jung et al.
Summary of Drive: Dual-robustness Via Information Variability and Entropic Consistency in Source-free Unsupervised Domain Adaptation, by Ruiqiang Xiao et al.
Summary of From Dashcam Videos to Driving Simulations: Stress Testing Automated Vehicles Against Rare Events, by Yan Miao et al.
Summary of Unitedvln: Generalizable Gaussian Splatting For Continuous Vision-language Navigation, by Guangzhao Dai et al.
Summary of Debiasing Classifiers by Amplifying Bias with Latent Diffusion and Large Language Models, By Donggeun Ko et al.
Summary of Enclip: Ensembling and Clustering-based Contrastive Language-image Pretraining For Fashion Multimodal Search with Limited Data and Low-quality Images, by Prithviraj Purushottam Naik et al.
Summary of Llm Augmentations to Support Analytical Reasoning Over Multiple Documents, by Raquib Bin Yousuf et al.
Summary of Cia: Controllable Image Augmentation Framework Based on Stable Diffusion, by Mohamed Benkedadra et al.
Summary of Med-persam: One-shot Visual Prompt Tuning For Personalized Segment Anything Model in Medical Domain, by Hangyul Yoon et al.
Summary of Local and Global Feature Attention Fusion Network For Face Recognition, by Wang Yu et al.
Summary of End-to-end Steering For Autonomous Vehicles Via Conditional Imitation Co-learning, by Mahmoud M. Kishky et al.
Summary of Salova: Segment-augmented Long Video Assistant For Targeted Retrieval and Routing in Long-form Video Analysis, by Junho Kim et al.
Summary of Enhancing Multi-agent Consensus Through Third-party Llm Integration: Analyzing Uncertainty and Mitigating Hallucinations in Large Language Models, by Zhihua Duan et al.
Summary of Diagnosis Of Diabetic Retinopathy Using Machine Learning & Deep Learning Technique, by Eric Shah et al.
Summary of Probing For Consciousness in Machines, by Mathis Immertreu et al.
Summary of One Diffusion to Generate Them All, by Duong H. Le et al.
Summary of Bayling 2: a Multilingual Large Language Model with Efficient Language Alignment, by Shaolei Zhang et al.
Summary of Human-calibrated Automated Testing and Validation Of Generative Language Models, by Agus Sudjianto et al.
Summary of Brain-like Emergent Properties in Deep Networks: Impact Of Network Architecture, Datasets and Training, by Niranjan Rajesh et al.
Summary of Adapter-based Approaches to Knowledge-enhanced Language Models — a Survey, by Alexander Fichtl et al.
Summary of Synthesising Handwritten Music with Gans: a Comprehensive Evaluation Of Cyclewgan, Progan, and Dcgan, by Elona Shatri et al.
Summary of Enhancing Grammatical Error Detection Using Bert with Cleaned Lang-8 Dataset, by Rahul Nihalani et al.
Summary of Large Language Model with Region-guided Referring and Grounding For Ct Report Generation, by Zhixuan Chen et al.
Summary of Rewind: Understanding Long Videos with Instructed Learnable Memory, by Anxhelo Diko et al.
Summary of Do Llms Agree on the Creativity Evaluation Of Alternative Uses?, by Abdullah Al Rabeyah et al.
Summary of A Survey on Llm-as-a-judge, by Jiawei Gu et al.
Summary of An Adversarial Feature Learning Based Semantic Communication Method For Human 3d Reconstruction, by Shaojiang Liu et al.
Summary of How Texts Help? a Fine-grained Evaluation to Reveal the Role Of Language in Vision-language Tracking, by Xuchen Li et al.
Summary of Aligning Generalisation Between Humans and Machines, by Filip Ilievski et al.
Summary of Ontology-constrained Generation Of Domain-specific Clinical Summaries, by Gaya Mehenni and Amal Zouaq
Summary of “all That Glitters”: Approaches to Evaluations with Unreliable Model and Human Annotations, by Michael Hardy
Summary of Ramie: Retrieval-augmented Multi-task Information Extraction with Large Language Models on Dietary Supplements, by Zaifu Zhan et al.
Summary of Ltcf-net: a Transformer-enhanced Dual-channel Fourier Framework For Low-light Image Restoration, by Gaojing Zhang and Jinglun Feng
Summary of Peng: Pose-enhanced Geo-localisation, by Tavis Shore et al.
Summary of Decoding Urban Industrial Complexity: Enhancing Knowledge-driven Insights Via Industryscopegpt, by Siqi Wang et al.
Summary of Fasttracktr:towards Fast Multi-object Tracking with Transformers, by Pan Liao et al.
Summary of Creating Scalable Agi: the Open General Intelligence Framework, by Daniel A. Dollinger et al.
Summary of Do Llms Really Think Step-by-step in Implicit Reasoning?, by Yijiong Yu
Summary of Deep Learning For Automated Multi-scale Functional Field Boundaries Extraction Using Multi-date Sentinel-2 and Planetscope Imagery: Case Study Of Netherlands and Pakistan, by Saba Zahid et al.
Summary of Highly Efficient and Unsupervised Framework For Moving Object Detection in Satellite Videos, by C. Xiao et al.
Summary of Generative Prompt Internalization, by Haebin Shin et al.
Summary of Vivid-10m: a Dataset and Baseline For Versatile and Interactive Video Local Editing, by Jiahao Hu et al.
Summary of Locref-diffusion:tuning-free Layout and Appearance-guided Generation, by Fan Deng et al.
Summary of Ai-driven Real-time Monitoring Of Ground-nesting Birds: a Case Study on Curlew Detection Using Yolov10, by Carl Chalmers et al.
Summary of Eadreg: Probabilistic Correspondence Generation with Efficient Autoregressive Diffusion Model For Outdoor Point Cloud Registration, by Linrui Gong et al.
Summary of Event Uskt : U-state Space Model in Knowledge Transfer For Event Cameras, by Yuhui Lin et al.
Summary of Mme-survey: a Comprehensive Survey on Evaluation Of Multimodal Llms, by Chaoyou Fu et al.
Summary of Sycophancy in Large Language Models: Causes and Mitigations, by Lars Malmqvist
Summary of Pplqa: An Unsupervised Information-theoretic Quality Metric For Comparing Generative Large Language Models, by Gerald Friedland et al.
Summary of Unigaussian: Driving Scene Reconstruction From Multiple Camera Models Via Unified Gaussian Representations, by Yuan Ren et al.
Summary of Regulator-manufacturer Ai Agents Modeling: Mathematical Feedback-driven Multi-agent Llm Framework, by Yu Han and Zekun Guo
Summary of Designing Cellular Manufacturing System in Presence Of Alternative Process Plans, by Md. Kutub Uddin et al.
Summary of Exploiting Watermark-based Defense Mechanisms in Text-to-image Diffusion Models For Unauthorized Data Usage, by Soumil Datta et al.
Summary of Gradient-free Classifier Guidance For Diffusion Model Sampling, by Rahul Shenoy et al.
Summary of Exploring Large Language Models For Multimodal Sentiment Analysis: Challenges, Benchmarks, and Future Directions, by Shezheng Song
Summary of Freepruner: a Training-free Approach For Large Multimodal Model Acceleration, by Bingxin Xu et al.
Summary of Fg-cxr: a Radiologist-aligned Gaze Dataset For Enhancing Interpretability in Chest X-ray Report Generation, by Trong Thang Pham et al.
Summary of Enhancing Instruction-following Capability Of Visual-language Models by Reducing Image Redundancy, By Te Yang et al.
Summary of Kinmo: Kinematic-aware Human Motion Understanding and Generation, by Pengfei Zhang et al.
Summary of Automatic Evaluation For Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark, by Rong-cheng Tu et al.
Summary of Interactive Visual Assessment For Text-to-image Generation Models, by Xiaoyue Mi et al.
Summary of Texgen: a Generative Diffusion Model For Mesh Textures, by Xin Yu et al.
Summary of Resolution-agnostic Transformer-based Climate Downscaling, by Declan Curran and Hira Saleem and Sanaa Hobeichi and Flora Salim
Summary of Focus: Knowledge-enhanced Adaptive Visual Compression For Few-shot Whole Slide Image Classification, by Zhengrui Guo et al.
Summary of Kbalign: Efficient Self Adaptation on Specific Knowledge Bases, by Zheni Zeng et al.
Summary of Videoespresso: a Large-scale Chain-of-thought Dataset For Fine-grained Video Reasoning Via Core Frame Selection, by Songhao Han et al.
Summary of Dynamics-aware Gaussian Splatting Streaming Towards Fast On-the-fly Training For 4d Reconstruction, by Zhening Liu et al.
Summary of Domain and Range Aware Synthetic Negatives Generation For Knowledge Graph Embedding Models, by Alberto Bernardi and Luca Costabello
Summary of Design-o-meter: Towards Evaluating and Refining Graphic Designs, by Sahil Goyal et al.
Summary of Llm For Barcodes: Generating Diverse Synthetic Data For Identity Documents, by Hitesh Laxmichand Patel et al.
Summary of Swissadt: An Audio Description Translation System For Swiss Languages, by Lukas Fischer et al.
Summary of Scribeagent: Towards Specialized Web Agents Using Production-scale Workflow Data, by Junhong Shen et al.
Summary of Empowering Clients: Transformation Of Design Processes Due to Generative Ai, by Johannes Schneider et al.
Summary of Xgrammar: Flexible and Efficient Structured Generation Engine For Large Language Models, by Yixin Dong et al.
Summary of Videorepair: Improving Text-to-video Generation Via Misalignment Evaluation and Localized Refinement, by Daeun Lee et al.
Summary of Rexrank: a Public Leaderboard For Ai-powered Radiology Report Generation, by Xiaoman Zhang et al.
Summary of Measuring Bullshit in the Language Games Played by Chatgpt, By Alessandro Trevisan et al.
Summary of Toxilab: How Well Do Open-source Llms Generate Synthetic Toxicity Data?, by Zheng Hui et al.