Paper List
We recommend you use the search box as this list is very long.
-
Summary of Fovealnet: Advancing Ai-driven Gaze Tracking Solutions For Optimized Foveated Rendering System Performance in Virtual Reality, by Wenxuan Liu et al.
-
Summary of Automatic Detection, Positioning and Counting Of Grape Bunches Using Robots, by Xumin Gao
-
Summary of Enriching Multimodal Sentiment Analysis Through Textual Emotional Descriptions Of Visual-audio Content, by Sheng Wu et al.
-
Summary of Vca: Video Curious Agent For Long Video Understanding, by Zeyuan Yang et al.
-
Summary of Crossvit-augmented Geospatial-intelligence Visualization System For Tracking Economic Development Dynamics, by Yanbing Bai et al.
-
Summary of Dynamic Entity-masked Graph Diffusion Model For Histopathological Image Representation Learning, by Zhenfeng Zhuang et al.
-
Summary of Svgbuilder: Component-based Colored Svg Generation with Text-guided Autoregressive Transformers, by Zehao Chen et al.
-
Summary of Automated Image Captioning with Cnns and Transformers, by Joshua Adrian Cahyono et al.
-
Summary of On Adversarial Robustness and Out-of-distribution Robustness Of Large Language Models, by April Yang et al.
-
Summary of Lan: Learning to Adapt Noise For Image Denoising, by Changjin Kim et al.
-
Summary of Evaluation Of Gpt-4o and Gpt-4o-mini’s Vision Capabilities For Compositional Analysis From Dried Solution Drops, by Deven B. Dangi et al.
-
Summary of Chasing Progress, Not Perfection: Revisiting Strategies For End-to-end Llm Plan Generation, by Sukai Huang et al.
-
Summary of Learning to Verify Summary Facts with Fine-grained Llm Feedback, by Jihwan Oh et al.
-
Summary of Iris: Breaking Gui Complexity with Adaptive Focus and Self-refining, by Zhiqi Ge et al.
-
Summary of A Dual Contrastive Framework, by Yuan Sun et al.
-
Summary of Apollo: An Exploration Of Video Understanding in Large Multimodal Models, by Orr Zohar et al.
-
Summary of Neural-symbolic Reasoning Over Knowledge Graphs: a Survey From a Query Perspective, by Lihui Liu et al.
-
Summary of Tango: Training-free Embodied Ai Agents For Open-world Tasks, by Filippo Ziliotto et al.
-
Summary of Generative Adversarial Reviews: When Llms Become the Critic, by Nicolas Bougie and Narimasa Watanabe
-
Summary of Evaluating Robustness Of Llms on Crisis-related Microblogs Across Events, Information Types, and Linguistic Features, by Muhammad Imran et al.
-
Summary of Supermerge: An Approach For Gradient-based Model Merging, by Haoyu Yang et al.
-
Summary of Constrained Decoding with Speculative Lookaheads, by Nishanth Nakshatri et al.
-
Summary of Autoprep: Natural Language Question-aware Data Preparation with a Multi-agent Framework, by Meihao Fan and Ju Fan and Nan Tang and Lei Cao and Guoliang Li and Xiaoyong Du
-
Summary of Leveraging Audio and Text Modalities in Mental Health: a Study Of Llms Performance, by Abdelrahman A. Ali et al.
-
Summary of Look Before You Leap: Enhancing Attention and Vigilance Regarding Harmful Content with Guidelinellm, by Shaoqing Zhang et al.
-
Summary of Llm-as-an-interviewer: Beyond Static Testing Through Dynamic Llm Evaluation, by Eunsu Kim et al.
-
Summary of Identifying and Manipulating Personality Traits in Llms Through Activation Engineering, by Rumi A. Allbert and James K. Wiles and Vlad Grankovsky
-
Summary of Active Inference For Self-organizing Multi-llm Systems: a Bayesian Thermodynamic Approach to Adaptation, by Rithvik Prakki
-
Summary of Nat-nl2gql: a Novel Multi-agent Framework For Translating Natural Language to Graph Query Language, by Yuanyuan Liang et al.
-
Summary of Gptdrawer: Enhancing Visual Synthesis Through Chatgpt, by Kun Li et al.
-
Summary of Imitate Before Detect: Aligning Machine Stylistic Preference For Machine-revised Text Detection, by Jiaqi Chen et al.
-
Summary of Steganography in Game Actions, by Ching-chun Chang and Isao Echizen
-
Summary of Coef-vq: Cost-efficient Video Quality Understanding Through a Cascaded Multimodal Llm Framework, by Xin Dong et al.
-
Summary of Enhancing Nursing and Elderly Care with Large Language Models: An Ai-driven Framework, by Qiao Sun et al.
-
Summary of Sumi-ifl: An Information-theoretic Framework For Image Forgery Localization with Sufficiency and Minimality Constraints, by Ziqi Sheng et al.
-
Summary of Small Language Model As Data Prospector For Large Language Model, by Shiwen Ni et al.
-
Summary of Large Action Models: From Inception to Implementation, by Lu Wang et al.
-
Summary of Visual Object Tracking Across Diverse Data Modalities: a Review, by Mengmeng Wang et al.
-
Summary of Tsgaussian: Semantic and Depth-guided Target-specific Gaussian Splatting From Sparse Views, by Liang Zhao et al.
-
Summary of Gaokao-eval: Does High Scores Truly Reflect Strong Capabilities in Llms?, by Zhikai Lei et al.
-
Summary of Data Pruning Can Do More: a Comprehensive Data Pruning Approach For Object Re-identification, by Zi Yang et al.
-
Summary of Retqa: a Large-scale Open-domain Tabular Question Answering Dataset For Real Estate Sector, by Zhensheng Wang et al.
-
Summary of Vlr-bench: Multilingual Benchmark Dataset For Vision-language Retrieval Augmented Generation, by Hyeonseok Lim et al.
-
Summary of Route: Robust Multitask Tuning and Collaboration For Text-to-sql, by Yang Qin et al.
-
Summary of Swifttry: Fast and Consistent Video Virtual Try-on with Diffusion Models, by Hung Nguyen et al.
-
Summary of Gaf: Gaussian Avatar Reconstruction From Monocular Videos Via Multi-view Diffusion, by Jiapeng Tang et al.
-
Summary of How Good Is My Story? Towards Quantitative Metrics For Evaluating Llm-generated Xai Narratives, by Timour Ichmoukhamedov et al.
-
Summary of Targeted Angular Reversal Of Weights (tars) For Knowledge Removal in Large Language Models, by Harry J. Davies et al.
-
Summary of Envisioning National Resources For Artificial Intelligence Research: Nsf Workshop Report, by Shantenu Jha and Yolanda Gil
-
Summary of Brushedit: All-in-one Image Inpainting and Editing, by Yaowei Li et al.
-
Summary of Deepseek-vl2: Mixture-of-experts Vision-language Models For Advanced Multimodal Understanding, by Zhiyu Wu et al.
-
Summary of Imitate, Explore, and Self-improve: a Reproduction Report on Slow-thinking Reasoning Systems, by Yingqian Min et al.
-
Summary of New Keypoint-based Approach For Recognising British Sign Language (bsl) From Sequences, by Oishi Deb and Kr Prajwal and Andrew Zisserman
-
Summary of The Parameters Of Educability, by Leslie G. Valiant
-
Summary of Vision Transformers For Efficient Indoor Pathloss Radio Map Prediction, by Edvard Ghukasyan et al.
-
Summary of Efficient and Comprehensive Feature Extraction in Large Vision-language Model For Clinical Pathology Analysis, by Shengxuming Zhang et al.
-
Summary of Internlm-xcomposer2.5-omnilive: a Comprehensive Multimodal System For Long-term Streaming Video and Audio Interactions, by Pan Zhang et al.
-
Summary of Timerefine: Temporal Grounding with Time Refining Video Llm, by Xizi Wang et al.
-
Summary of Olympus: a Universal Task Router For Computer Vision Tasks, by Yuanze Lin et al.
-
Summary of Bridging Ai and Science: Implications From a Large-scale Literature Analysis Of Ai4science, by Yutong Xie et al.
-
Summary of Evaluation Agent: Efficient and Promptable Evaluation Framework For Visual Generative Models, by Fan Zhang and Shulin Tian and Ziqi Huang and Yu Qiao and Ziwei Liu
-
Summary of Systematic Analysis Of Llm Contributions to Planning: Solver, Verifier, Heuristic, by Haoming Li et al.
-
Summary of From Noise to Nuance: Advances in Deep Generative Image Models, by Benji Peng et al.
-
Summary of On Round-off Errors and Gaussian Blur in Superresolution and in Image Registration, by Serap A. Savari
-
Summary of Memory Layers at Scale, by Vincent-pierre Berges et al.
-
Summary of Autopatent: a Multi-agent Framework For Automatic Patent Generation, by Qiyao Wang et al.
-
Summary of Learning Visually Grounded Domain Ontologies Via Embodied Conversation and Explanation, by Jonghyuk Park et al.
-
Summary of Semi-iin: Semi-supervised Intra-inter Modal Interaction Learning Network For Multimodal Sentiment Analysis, by Jinhao Lin et al.
-
Summary of Cp-detr: Concept Prompt Guide Detr Toward Stronger Universal Object Detection, by Qibo Chen et al.
-
Summary of Meralion-audiollm: Bridging Audio and Language with Large Language Models, by Yingxu He et al.
-
Summary of B-vllm: a Vision Large Language Model with Balanced Spatio-temporal Tokens, by Zhuqiang Lu et al.
-
Summary of Polyipa — Multilingual Phoneme-to-grapheme Conversion Model, by Davor Lauc
-
Summary of Goal-driven Query Answering Over First- and Second-order Dependencies with Equality, by Efthymia Tsamoura and Boris Motik
-
Summary of Lmagent: a Large-scale Multimodal Agents Society For Multi-user Simulation, by Yijun Liu et al.
-
Summary of Vlms Meet Uda: Boosting Transferability Of Open Vocabulary Segmentation with Unsupervised Domain Adaptation, by Roberto Alcover-couso et al.
-
Summary of First Train to Generate, Then Generate to Train: Unitedsynt5 For Few-shot Nli, by Sourav Banerjee et al.
-
Summary of Speeding Up Approximate Map by Applying Domain Knowledge About Relevant Variables, By Johan Kwisthout and Andrew Schroeder
-
Summary of Towards Understanding the Robustness Of Llm-based Evaluations Under Perturbations, by Manav Chaudhary et al.
-
Summary of Towards a Multimodal Large Language Model with Pixel-level Insight For Biomedicine, by Xiaoshuang Huang et al.
-
Summary of Instancecap: Improving Text-to-video Generation Via Instance-aware Structured Caption, by Tiehan Fan et al.
-
Summary of Advancing Attribution-based Neural Network Explainability Through Relative Absolute Magnitude Layer-wise Relevance Propagation and Multi-component Evaluation, by Davor Vukadin et al.
-
Summary of Beware Of Metacognitive Laziness: Effects Of Generative Artificial Intelligence on Learning Motivation, Processes, and Performance, by Yizhou Fan et al.
-
Summary of Benchmarking Llms For Mimicking Child-caregiver Language in Interaction, by Jing Liu et al.
-
Summary of Causal Graphical Models For Vision-language Compositional Understanding, by Fiorenzo Parascandolo et al.
-
Summary of Ai Predicts Agi: Leveraging Agi Forecasting and Peer Review to Explore Llms’ Complex Reasoning Capabilities, by Fabrizio Davide et al.
-
Summary of Word Sense Linking: Disambiguating Outside the Sandbox, by Andrei Stefan Bejgu et al.
-
Summary of All You Need in Knowledge Distillation Is a Tailored Coordinate System, by Junjie Zhou et al.
-
Summary of Ufo: Enhancing Diffusion-based Video Generation with a Uniform Frame Organizer, by Delong Liu et al.
-
Summary of Uncommon Belief in Rationality, by Qi Shi and Pavel Naumov
-
Summary of Coverage-based Fairness in Multi-document Summarization, by Haoyuan Li et al.
-
Summary of Gama: Generative Agents For Multi-agent Autoformalization, by Agnieszka Mensfelt and Kostas Stathis and Vince Trencsenyi
-
Summary of Exploring Large Language Models on Cross-cultural Values in Connection with Training Methodology, by Minsang Kim and Seungjun Baek
-
Summary of Labits: Layered Bidirectional Time Surfaces Representation For Event Camera-based Continuous Dense Trajectory Estimation, by Zhongyang Zhang et al.