Joint Embedding Predictive Architecture

Joint Embedding Predictive Architectures (JEPAs) are a self-supervised learning framework aiming to learn robust data representations by predicting the latent representation of one part of a data sample from another. Current research focuses on applying JEPAs to diverse data modalities, including tabular data, images, audio, video, and even brain activity, often employing transformer-based architectures. This approach shows promise for improving performance on downstream tasks like classification, generation, and prediction across various fields, offering a powerful alternative to traditional supervised learning methods, especially when labeled data is scarce.

Papers