Summary of Georeasoner: Geo-localization with Reasoning in Street Views Using a Large Vision-language Model, by Ling Li et al.
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
by Ling Li, Yu Ye, Bingchuan Jiang, Wei Zeng
First submitted to arxiv on: 3 Jun 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper presents a novel approach to geo-localization using a large vision-language model (LVLM) enhanced with human inference knowledge. The LVLM is trained on a new dataset created by quantifying the locatability of street-view images, which addresses the scarcity of high-quality data for training models. The authors integrate external knowledge from real geo-localization games to improve reasoning inference and train GeoReasoner, a model that outperforms other LVLMs in country-level and city-level geo-localization tasks. The results demonstrate an improvement of over 25% at the country level and 38% at the city level compared to previous approaches. The paper also provides a comparison with StreetCLIP performance while requiring fewer training resources. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps solve the problem of finding locations on maps using computer models. To make these models better, the authors combine two types of data: images of streets and real-world knowledge from people who play geo-localization games. This new combination leads to a model called GeoReasoner that is much more accurate than previous models at finding locations on maps. The results show that GeoReasoner is 25% better at country-level location tasks and 38% better at city-level tasks. |
Keywords
* Artificial intelligence * Inference * Language model