Summary of Aug: a New Dataset and An Efficient Model For Aerial Image Urban Scene Graph Generation, by Yansheng Li et al.
AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation
by Yansheng Li, Kun Li, Yongjun Zhang, Linlin Wang, Dingwen Zhang
First submitted to arxiv on: 11 Apr 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Scene graph generation (SGG) is a crucial task in computer vision, aiming to understand visual objects and their semantic relationships from a single image. Previous SGG datasets have focused on eye-level views, but this paper addresses the scarcity of overhead view datasets by releasing an aerial image urban scene graph generation (AUG) dataset. The AUG dataset contains 25,594 annotated objects, 16,970 relationships, and 27,175 attributes, captured from low-altitude overhead views. To tackle the complexity of the aerial urban scene, this paper proposes a locality-preserving graph convolutional network (LPG), which integrates initial features with dynamically updated neighborhood information to preserve local context while capturing global context. Additionally, an adaptive bounding box scaling factor for potential relationship detection (ABS-PRD) is introduced to prune meaningless relationship pairs. Experimental results on the AUG dataset demonstrate that LPG outperforms state-of-the-art methods, highlighting the effectiveness of the proposed locality-preserving strategy. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine taking a bird’s-eye view photo of a city or town. This paper creates a special set of images and data to help computers understand what they see in these aerial photos. They want to teach computers to recognize objects like buildings, roads, and trees, as well as the relationships between them. To make this task easier, they propose two new techniques: one that helps computers focus on local details while still seeing the bigger picture, and another that helps eliminate unnecessary information. By testing their ideas on a large dataset of aerial photos, they show that these methods are more effective than others at understanding what’s in those images. |
Keywords
» Artificial intelligence » Bounding box » Convolutional network