Paper List
We recommend you use the search box as this list is very long.
-
Summary of Superhuman Performance Of a Large Language Model on the Reasoning Tasks Of a Physician, by Peter G. Brodeur et al.
-
Summary of Heterogeneous Graph Transformer For Multiple Tiny Object Tracking in Rgb-t Videos, by Qingyu Xu et al.
-
Summary of Tokens, the Oft-overlooked Appetizer: Large Language Models, the Distributional Hypothesis, and Meaning, by Julia Witte Zimmerman et al.
-
Summary of Llm-as-an-interviewer: Beyond Static Testing Through Dynamic Llm Evaluation, by Eunsu Kim et al.
-
Summary of Active Inference For Self-organizing Multi-llm Systems: a Bayesian Thermodynamic Approach to Adaptation, by Rithvik Prakki
-
Summary of Identifying and Manipulating Personality Traits in Llms Through Activation Engineering, by Rumi A. Allbert and James K. Wiles and Vlad Grankovsky
-
Summary of Gptdrawer: Enhancing Visual Synthesis Through Chatgpt, by Kun Li et al.
-
Summary of Nat-nl2gql: a Novel Multi-agent Framework For Translating Natural Language to Graph Query Language, by Yuanyuan Liang et al.
-
Summary of Imitate Before Detect: Aligning Machine Stylistic Preference For Machine-revised Text Detection, by Jiaqi Chen et al.
-
Summary of Coef-vq: Cost-efficient Video Quality Understanding Through a Cascaded Multimodal Llm Framework, by Xin Dong et al.
-
Summary of Multi-level Matching Network For Multimodal Entity Linking, by Zhiwei Hu et al.
-
Summary of Steganography in Game Actions, by Ching-chun Chang and Isao Echizen
-
Summary of Sweettok: Semantic-aware Spatial-temporal Tokenizer For Compact Video Discretization, by Zhentao Tan et al.
-
Summary of Unlocking Visual Secrets: Inverting Features with Diffusion Priors For Image Reconstruction, by Sai Qian Zhang et al.
-
Summary of Disentanglement and Compositionality Of Letter Identity and Letter Position in Variational Auto-encoder Vision Models, by Bruno Bianchi et al.
-
Summary of Geo-llava: a Large Multi-modal Model For Solving Geometry Math Problems with Meta In-context Learning, by Shihao Xu et al.
-
Summary of Fovealnet: Advancing Ai-driven Gaze Tracking Solutions For Optimized Foveated Rendering System Performance in Virtual Reality, by Wenxuan Liu et al.
-
Summary of Enriching Multimodal Sentiment Analysis Through Textual Emotional Descriptions Of Visual-audio Content, by Sheng Wu et al.
-
Summary of Automatic Detection, Positioning and Counting Of Grape Bunches Using Robots, by Xumin Gao
-
Summary of Vca: Video Curious Agent For Long Video Understanding, by Zeyuan Yang et al.
-
Summary of Crossvit-augmented Geospatial-intelligence Visualization System For Tracking Economic Development Dynamics, by Yanbing Bai et al.
-
Summary of Dynamic Entity-masked Graph Diffusion Model For Histopathological Image Representation Learning, by Zhenfeng Zhuang et al.
-
Summary of Svgbuilder: Component-based Colored Svg Generation with Text-guided Autoregressive Transformers, by Zehao Chen et al.
-
Summary of Swifttry: Fast and Consistent Video Virtual Try-on with Diffusion Models, by Hung Nguyen et al.
-
Summary of Gaf: Gaussian Avatar Reconstruction From Monocular Videos Via Multi-view Diffusion, by Jiapeng Tang et al.
-
Summary of How Good Is My Story? Towards Quantitative Metrics For Evaluating Llm-generated Xai Narratives, by Timour Ichmoukhamedov et al.
-
Summary of Targeted Angular Reversal Of Weights (tars) For Knowledge Removal in Large Language Models, by Harry J. Davies et al.
-
Summary of Envisioning National Resources For Artificial Intelligence Research: Nsf Workshop Report, by Shantenu Jha and Yolanda Gil
-
Summary of Deepseek-vl2: Mixture-of-experts Vision-language Models For Advanced Multimodal Understanding, by Zhiyu Wu et al.
-
Summary of Brushedit: All-in-one Image Inpainting and Editing, by Yaowei Li et al.
-
Summary of Iris: Breaking Gui Complexity with Adaptive Focus and Self-refining, by Zhiqi Ge et al.
-
Summary of A Dual Contrastive Framework, by Yuan Sun et al.
-
Summary of Neural-symbolic Reasoning Over Knowledge Graphs: a Survey From a Query Perspective, by Lihui Liu et al.
-
Summary of Apollo: An Exploration Of Video Understanding in Large Multimodal Models, by Orr Zohar et al.
-
Summary of Evaluating Robustness Of Llms on Crisis-related Microblogs Across Events, Information Types, and Linguistic Features, by Muhammad Imran et al.
-
Summary of Tango: Training-free Embodied Ai Agents For Open-world Tasks, by Filippo Ziliotto et al.
-
Summary of Generative Adversarial Reviews: When Llms Become the Critic, by Nicolas Bougie and Narimasa Watanabe
-
Summary of Supermerge: An Approach For Gradient-based Model Merging, by Haoyu Yang et al.
-
Summary of Leveraging Audio and Text Modalities in Mental Health: a Study Of Llms Performance, by Abdelrahman A. Ali et al.
-
Summary of Autoprep: Natural Language Question-aware Data Preparation with a Multi-agent Framework, by Meihao Fan and Ju Fan and Nan Tang and Lei Cao and Guoliang Li and Xiaoyong Du
-
Summary of Constrained Decoding with Speculative Lookaheads, by Nishanth Nakshatri et al.
-
Summary of Look Before You Leap: Enhancing Attention and Vigilance Regarding Harmful Content with Guidelinellm, by Shaoqing Zhang et al.
-
Summary of On Round-off Errors and Gaussian Blur in Superresolution and in Image Registration, by Serap A. Savari
-
Summary of Memory Layers at Scale, by Vincent-pierre Berges et al.
-
Summary of Learning Visually Grounded Domain Ontologies Via Embodied Conversation and Explanation, by Jonghyuk Park et al.
-
Summary of Semi-iin: Semi-supervised Intra-inter Modal Interaction Learning Network For Multimodal Sentiment Analysis, by Jinhao Lin et al.
-
Summary of Cp-detr: Concept Prompt Guide Detr Toward Stronger Universal Object Detection, by Qibo Chen et al.
-
Summary of Autopatent: a Multi-agent Framework For Automatic Patent Generation, by Qiyao Wang et al.
-
Summary of Meralion-audiollm: Bridging Audio and Language with Large Language Models, by Yingxu He et al.
-
Summary of B-vllm: a Vision Large Language Model with Balanced Spatio-temporal Tokens, by Zhuqiang Lu et al.
-
Summary of Enhancing Nursing and Elderly Care with Large Language Models: An Ai-driven Framework, by Qiao Sun et al.
-
Summary of Sumi-ifl: An Information-theoretic Framework For Image Forgery Localization with Sufficiency and Minimality Constraints, by Ziqi Sheng et al.
-
Summary of Small Language Model As Data Prospector For Large Language Model, by Shiwen Ni et al.
-
Summary of Visual Object Tracking Across Diverse Data Modalities: a Review, by Mengmeng Wang et al.
-
Summary of Tsgaussian: Semantic and Depth-guided Target-specific Gaussian Splatting From Sparse Views, by Liang Zhao et al.
-
Summary of Large Action Models: From Inception to Implementation, by Lu Wang et al.
-
Summary of Data Pruning Can Do More: a Comprehensive Data Pruning Approach For Object Re-identification, by Zi Yang et al.
-
Summary of Gaokao-eval: Does High Scores Truly Reflect Strong Capabilities in Llms?, by Zhikai Lei et al.
-
Summary of Retqa: a Large-scale Open-domain Tabular Question Answering Dataset For Real Estate Sector, by Zhensheng Wang et al.
-
Summary of Route: Robust Multitask Tuning and Collaboration For Text-to-sql, by Yang Qin et al.
-
Summary of Vlr-bench: Multilingual Benchmark Dataset For Vision-language Retrieval Augmented Generation, by Hyeonseok Lim et al.
-
Summary of Beware Of Metacognitive Laziness: Effects Of Generative Artificial Intelligence on Learning Motivation, Processes, and Performance, by Yizhou Fan et al.
-
Summary of Benchmarking Llms For Mimicking Child-caregiver Language in Interaction, by Jing Liu et al.
-
Summary of Causal Graphical Models For Vision-language Compositional Understanding, by Fiorenzo Parascandolo et al.
-
Summary of Word Sense Linking: Disambiguating Outside the Sandbox, by Andrei Stefan Bejgu et al.
-
Summary of All You Need in Knowledge Distillation Is a Tailored Coordinate System, by Junjie Zhou et al.
-
Summary of Ai Predicts Agi: Leveraging Agi Forecasting and Peer Review to Explore Llms’ Complex Reasoning Capabilities, by Fabrizio Davide et al.
-
Summary of Ufo: Enhancing Diffusion-based Video Generation with a Uniform Frame Organizer, by Delong Liu et al.
-
Summary of Uncommon Belief in Rationality, by Qi Shi and Pavel Naumov
-
Summary of Imitate, Explore, and Self-improve: a Reproduction Report on Slow-thinking Reasoning Systems, by Yingqian Min et al.
-
Summary of Vision Transformers For Efficient Indoor Pathloss Radio Map Prediction, by Edvard Ghukasyan et al.
-
Summary of New Keypoint-based Approach For Recognising British Sign Language (bsl) From Sequences, by Oishi Deb and Kr Prajwal and Andrew Zisserman
-
Summary of The Parameters Of Educability, by Leslie G. Valiant
-
Summary of Efficient and Comprehensive Feature Extraction in Large Vision-language Model For Clinical Pathology Analysis, by Shengxuming Zhang et al.
-
Summary of Internlm-xcomposer2.5-omnilive: a Comprehensive Multimodal System For Long-term Streaming Video and Audio Interactions, by Pan Zhang et al.
-
Summary of Timerefine: Temporal Grounding with Time Refining Video Llm, by Xizi Wang et al.
-
Summary of Olympus: a Universal Task Router For Computer Vision Tasks, by Yuanze Lin et al.
-
Summary of Evaluation Agent: Efficient and Promptable Evaluation Framework For Visual Generative Models, by Fan Zhang and Shulin Tian and Ziqi Huang and Yu Qiao and Ziwei Liu
-
Summary of Bridging Ai and Science: Implications From a Large-scale Literature Analysis Of Ai4science, by Yutong Xie et al.
-
Summary of From Noise to Nuance: Advances in Deep Generative Image Models, by Benji Peng et al.
-
Summary of Systematic Analysis Of Llm Contributions to Planning: Solver, Verifier, Heuristic, by Haoming Li et al.
-
Summary of What Makes Cryptic Crosswords Challenging For Llms?, by Abdelrahman Sadallah et al.
-
Summary of Shiksha: a Technical Domain Focused Translation Dataset and Model For Indian Languages, by Advait Joglekar and Srinivasan Umesh
-
Summary of Multi-task Learning with Llms For Implicit Sentiment Analysis: Data-level and Task-level Automatic Weight Learning, by Wenna Lai et al.
-
Summary of Motif Guided Graph Transformer with Combinatorial Skeleton Prototype Learning For Skeleton-based Person Re-identification, by Haocong Rao et al.
-
Summary of A Context-enhanced Framework For Sequential Graph Reasoning, by Shuo Shi et al.
-
Summary of Forest-of-thought: Scaling Test-time Compute For Enhancing Llm Reasoning, by Zhenni Bi et al.
-
Summary of Temporal Numeric Planning with Patterns, by Matteo Cardellini and Enrico Giunchiglia
-
Summary of Polyipa — Multilingual Phoneme-to-grapheme Conversion Model, by Davor Lauc
-
Summary of Goal-driven Query Answering Over First- and Second-order Dependencies with Equality, by Efthymia Tsamoura and Boris Motik
-
Summary of Lmagent: a Large-scale Multimodal Agents Society For Multi-user Simulation, by Yijun Liu et al.
-
Summary of Vlms Meet Uda: Boosting Transferability Of Open Vocabulary Segmentation with Unsupervised Domain Adaptation, by Roberto Alcover-couso et al.
-
Summary of First Train to Generate, Then Generate to Train: Unitedsynt5 For Few-shot Nli, by Sourav Banerjee et al.
-
Summary of Speeding Up Approximate Map by Applying Domain Knowledge About Relevant Variables, By Johan Kwisthout and Andrew Schroeder
-
Summary of Towards Understanding the Robustness Of Llm-based Evaluations Under Perturbations, by Manav Chaudhary et al.