Summary of Unveiling the Power Of Self-supervision For Multi-view Multi-human Association and Tracking, by Wei Feng et al.
Unveiling the Power of Self-supervision for Multi-view Multi-human Association and Tracking
by Wei Feng, Feifan Wang, Ruize Han, Zekun Qian, Song Wang
First submitted to arxiv on: 31 Jan 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel problem in multi-person scene video surveillance is tackled by introducing Multi-view multi-human association and tracking (MvMHAT), which aims to track individuals over time within each view while identifying them across different views simultaneously. This challenge differs from previous MOT and multi-camera MOT tasks, which only consider over-time human tracking. To address this problem, a self-supervised learning aware end-to-end network is proposed, leveraging spatial-temporal self-consistency rationale based on reflexivity, symmetry, and transitivity properties. The network’s losses are designed to optimize appearance feature learning and assignment matrix optimization for associating multiple humans over time and across views. Two large-scale benchmarks are built for training and testing different algorithms, verifying the effectiveness of the proposed method. Code and benchmark have been released publicly. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Multi-view multi-human association and tracking is a new challenge in video surveillance. It’s like trying to follow a group of people in multiple cameras at once! Right now, we can only track people over time or across different cameras, but not both. This makes it harder to solve. To make things easier, we created a special kind of network that can learn from itself and figure out how to match people across cameras and over time. We also built two big datasets to test our approach and see if other algorithms work well too. |
Keywords
* Artificial intelligence * Optimization * Self supervised * Tracking