Summary of Pedestrian Attribute Recognition: a New Benchmark Dataset and a Large Language Model Augmented Framework, by Jiandong Jin et al.
Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework
by Jiandong Jin, Xiao Wang, Qian Zhu, Haiyang Wang, Chenglong Li
First submitted to arxiv on: 19 Aug 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel large-scale dataset, MSP60K, is proposed to address the limitations of existing pedestrian attribute recognition (PAR) datasets. The new dataset consists of 60,122 images and 57 attribute annotations across eight scenarios, bridging the gap between current datasets and real-world challenging scenarios through synthetic degradation. A benchmark is established by evaluating 17 representative PAR models under random and cross-domain split protocols. Additionally, an innovative Large Language Model (LLM) augmented PAR framework, LLM-PAR, is introduced, which leverages a Vision Transformer (ViT) backbone to extract features and a multi-embedding query Transformer for partial-aware feature learning. The proposed framework is enhanced with LLM for ensemble learning and visual feature augmentation, demonstrating its efficacy across multiple PAR benchmark datasets. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary A new dataset called MSP60K is created to help machines recognize people better. This dataset has 60,000 pictures of pedestrians with labels about their attributes, like what they are wearing. The dataset tries to match real-world situations by adding fake noise and changing the lighting. The performance of different models is tested on this new dataset, which helps create a fair benchmark. A special model called LLM-PAR uses artificial intelligence and computer vision to improve recognition. This model is better than others at recognizing people in different situations. |
Keywords
» Artificial intelligence » Embedding » Large language model » Transformer » Vision transformer » Vit