Loading Now

Summary of Emiff: Enhanced Multi-scale Image Feature Fusion For Vehicle-infrastructure Cooperative 3d Object Detection, by Zhe Wang et al.


EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

by Zhe Wang, Siqi Fan, Xiaoliang Huo, Tongda Xu, Yan Wang, Jingjing Liu, Yilun Chen, Ya-Qin Zhang

First submitted to arxiv on: 23 Feb 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a novel framework for cooperative 3D object detection in autonomous driving, addressing two major challenges: pose errors caused by camera asynchrony and information loss due to limited communication bandwidth. The Enhanced Multi-scale Image Feature Fusion (EMIFF) approach combines multi-view cameras from vehicles and infrastructure to provide rich semantic context, using Multi-scale Cross Attention (MCA) and Camera-aware Channel Masking (CCM) modules to correct pose errors. A Feature Compression (FC) module is also introduced for transmission efficiency. Experimental results on the DAIR-V2X-C dataset show that EMIFF achieves state-of-the-art performance, outperforming previous methods with comparable transmission costs.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps self-driving cars see better by combining camera views from vehicles and infrastructure. Right now, it’s tricky to correct for differences in camera angles and speeds between vehicles, which can make it hard to detect objects like pedestrians or traffic signs. The authors also want to reduce the amount of data that needs to be transmitted over the airwaves. They propose a new way to do this called Enhanced Multi-scale Image Feature Fusion (EMIFF). It uses special attention modules to correct for camera differences and compresses the data to make it easier to send. The results show that their approach works really well on a benchmark dataset.

Keywords

» Artificial intelligence  » Attention  » Cross attention  » Object detection