Paper ID: 2309.08273

A Generative Framework for Self-Supervised Facial Representation Learning

Ruian He, Zhen Xing, Weimin Tan, Bo Yan

Self-supervised representation learning has gained increasing attention for strong generalization ability without relying on paired datasets. However, it has not been explored sufficiently for facial representation. Self-supervised facial representation learning remains unsolved due to the coupling of facial identities, expressions, and external factors like pose and light. Prior methods primarily focus on contrastive learning and pixel-level consistency, leading to limited interpretability and suboptimal performance. In this paper, we propose LatentFace, a novel generative framework for self-supervised facial representations. We suggest that the disentangling problem can be also formulated as generative objectives in space and time, and propose the solution using a 3D-aware latent diffusion model. First, we introduce a 3D-aware autoencoder to encode face images into 3D latent embeddings. Second, we propose a novel representation diffusion model to disentangle 3D latent into facial identity and expression. Consequently, our method achieves state-of-the-art performance in facial expression recognition (FER) and face verification among self-supervised facial representation learning models. Our model achieves a 3.75\% advantage in FER accuracy on RAF-DB and 3.35\% on AffectNet compared to SOTA methods.

Submitted: Sep 15, 2023