Loading Now

Summary of Long-term Frame-event Visual Tracking: Benchmark Dataset and Baseline, by Xiao Wang et al.


Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline

by Xiao Wang, Ju Huang, Shiao Wang, Chuanming Tang, Bo Jiang, Yonghong Tian, Jin Tang, Bin Luo

First submitted to arxiv on: 9 Mar 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A new long-term and large-scale frame-event single object tracking dataset, called FELT, is proposed to evaluate the performance of existing tracking algorithms in real-world scenarios. The FELT dataset contains 742 videos and 1,594,474 RGB frames and event stream pairs, making it the largest frame-event tracking dataset to date. To address the challenges of incomplete data due to challenging factors and spatially sparse event flow, a novel associative memory Transformer network is proposed as a unified backbone by introducing modern Hopfield layers into multi-head self-attention blocks to fuse both RGB and event data. The model is evaluated on multiple datasets, including FELT, RGB-Thermal, RGB-Depth, and DepthTrack, demonstrating its effectiveness.
Low GrooveSquid.com (original content) Low Difficulty Summary
A team of researchers created a new way to track objects over time by combining two types of information: what something looks like (RGB) and how it’s changing (event streams). They made a big dataset with lots of examples to test this idea. To make it work better, they developed a special computer program that combines both types of information. They tested their program on several different datasets and showed that it works well.

Keywords

* Artificial intelligence  * Object tracking  * Self attention  * Tracking  * Transformer