Summary of Long-term Frame-event Visual Tracking: Benchmark Dataset and Baseline, by Xiao Wang et al.
Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline
by Xiao Wang, Ju Huang, Shiao Wang, Chuanming Tang, Bo Jiang, Yonghong Tian, Jin Tang, Bin Luo
First submitted to arxiv on: 9 Mar 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A new long-term and large-scale frame-event single object tracking dataset, called FELT, is proposed to evaluate the performance of existing tracking algorithms in real-world scenarios. The FELT dataset contains 742 videos and 1,594,474 RGB frames and event stream pairs, making it the largest frame-event tracking dataset to date. To address the challenges of incomplete data due to challenging factors and spatially sparse event flow, a novel associative memory Transformer network is proposed as a unified backbone by introducing modern Hopfield layers into multi-head self-attention blocks to fuse both RGB and event data. The model is evaluated on multiple datasets, including FELT, RGB-Thermal, RGB-Depth, and DepthTrack, demonstrating its effectiveness. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary A team of researchers created a new way to track objects over time by combining two types of information: what something looks like (RGB) and how it’s changing (event streams). They made a big dataset with lots of examples to test this idea. To make it work better, they developed a special computer program that combines both types of information. They tested their program on several different datasets and showed that it works well. |
Keywords
* Artificial intelligence * Object tracking * Self attention * Tracking * Transformer