Loading Now

Summary of Urfound: Towards Universal Retinal Foundation Models Via Knowledge-guided Masked Modeling, by Kai Yu et al.


UrFound: Towards Universal Retinal Foundation Models via Knowledge-Guided Masked Modeling

by Kai Yu, Yang Zhou, Yang Bai, Zhi Da Soh, Xinxing Xu, Rick Siow Mong Goh, Ching-Yu Cheng, Yong Liu

First submitted to arxiv on: 10 Aug 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Retinal foundation models aim to learn generalizable representations from diverse retinal images, facilitating label-efficient model adaptation across various ophthalmic tasks. Our paper introduces UrFound, a retinal foundation model that learns universal representations from both multimodal retinal images and domain knowledge. UrFound is equipped with a modality-agnostic image encoder and accepts either Color Fundus Photography (CFP) or Optical Coherence Tomography (OCT) images as inputs. We propose a knowledge-guided masked modeling strategy for model pre-training, which involves reconstructing randomly masked patches of retinal images while predicting masked text tokens conditioned on the corresponding retinal image. This approach aligns multimodal images and textual expert annotations within a unified latent space, facilitating generalizable and domain-specific representation learning. Experimental results demonstrate that UrFound exhibits strong generalization ability and data efficiency when adapting to various tasks in retinal image analysis. By training on ~180k retinal images, UrFound significantly outperforms the state-of-the-art retinal foundation model trained on up to 1.6 million unlabelled images across 8 public retinal datasets.
Low GrooveSquid.com (original content) Low Difficulty Summary
UrFound is a new way to learn about retinas that helps computers understand images from different sources and types of information. It’s like teaching a computer to read and understand medical reports along with the images. This helps the computer make better decisions when looking at retinal images, which can help doctors diagnose eye problems earlier and more accurately.

Keywords

» Artificial intelligence  » Encoder  » Generalization  » Latent space  » Representation learning