Modal Disentanglement

Modal disentanglement aims to separate independent factors of variation within multimodal data, enabling improved understanding and manipulation of individual modalities and their relationships. Current research focuses on developing robust methods for disentangling information across modalities, even in the presence of noise or imperfect data alignments, often employing information-theoretic frameworks and hierarchical architectures with optimized codebooks. These advancements are improving performance in various applications, including cross-modal retrieval, face anti-spoofing, and the generation of photorealistic virtual humans, by allowing for more accurate and nuanced analysis and synthesis of multimodal information.

Papers