Loading Now

Summary of Pedestrian Attribute Recognition: a New Benchmark Dataset and a Large Language Model Augmented Framework, by Jiandong Jin et al.


Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework

by Jiandong Jin, Xiao Wang, Qian Zhu, Haiyang Wang, Chenglong Li

First submitted to arxiv on: 19 Aug 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel large-scale dataset, MSP60K, is proposed to address the limitations of existing pedestrian attribute recognition (PAR) datasets. The new dataset consists of 60,122 images and 57 attribute annotations across eight scenarios, bridging the gap between current datasets and real-world challenging scenarios through synthetic degradation. A benchmark is established by evaluating 17 representative PAR models under random and cross-domain split protocols. Additionally, an innovative Large Language Model (LLM) augmented PAR framework, LLM-PAR, is introduced, which leverages a Vision Transformer (ViT) backbone to extract features and a multi-embedding query Transformer for partial-aware feature learning. The proposed framework is enhanced with LLM for ensemble learning and visual feature augmentation, demonstrating its efficacy across multiple PAR benchmark datasets.
Low GrooveSquid.com (original content) Low Difficulty Summary
A new dataset called MSP60K is created to help machines recognize people better. This dataset has 60,000 pictures of pedestrians with labels about their attributes, like what they are wearing. The dataset tries to match real-world situations by adding fake noise and changing the lighting. The performance of different models is tested on this new dataset, which helps create a fair benchmark. A special model called LLM-PAR uses artificial intelligence and computer vision to improve recognition. This model is better than others at recognizing people in different situations.

Keywords

» Artificial intelligence  » Embedding  » Large language model  » Transformer  » Vision transformer  » Vit