Paper ID: 2411.01767

Data Augmentations Go Beyond Encoding Invariances: A Theoretical Study on Self-Supervised Learning

Shlomo Libo Feigin, Maximilian Fleissner, Debarghya Ghoshdastidar

Understanding the role of data augmentations is critical for applying Self-Supervised Learning (SSL) methods in new domains. Data augmentations are commonly understood as encoding invariances into the learned representations. This interpretation suggests that SSL would require diverse augmentations that resemble the original data. However, in practice, augmentations do not need to be similar to the original data nor be diverse, and can be neither at the same time. We provide a theoretical insight into this phenomenon. We show that for different SSL losses, any non-redundant representation can be learned with a single suitable augmentation. We provide an algorithm to reconstruct such augmentations and give insights into augmentation choices in SSL.

Submitted: Nov 4, 2024