Loading Now

Summary of How Jepa Avoids Noisy Features: the Implicit Bias Of Deep Linear Self Distillation Networks, by Etai Littwin et al.


How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

by Etai Littwin, Omid Saremi, Madhu Advani, Vimal Thilak, Preetum Nakkiran, Chen Huang, Joshua Susskind

First submitted to arxiv on: 3 Jul 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper explores the mechanisms behind two self-supervised learning paradigms: Joint Embedding Predictive Architecture (JEPA) and Masked AutoEncoder (MAE). The JEPA approach, which includes methods like self-distillation, prioritizes abstract features over pixel-level information. In contrast, MAE focuses on reconstructing missing input parts in the data space. This paper analyzes the training dynamics of deep linear models to understand why JEPAs tend to learn high-influence features with high regression coefficients. The authors’ findings suggest that JEPA’s implicit bias towards predicting in latent space may contribute to its success in practice.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research compares two ways to train AI models without labeled data. One method, called Joint Embedding Predictive Architecture (JEPA), focuses on learning abstract features that are important for understanding the data. The other method, Masked AutoEncoder (MAE), tries to reconstruct missing parts of the input data. This paper looks at how these two approaches work and why JEPA tends to learn more useful features than MAE.

Keywords

» Artificial intelligence  » Autoencoder  » Distillation  » Embedding  » Latent space  » Mae  » Regression  » Self supervised