Multi Modality
Multimodality in machine learning focuses on integrating information from diverse data sources (e.g., text, images, audio, sensor data) to improve model performance and robustness. Current research emphasizes developing effective fusion strategies within various model architectures, including transformers and autoencoders, often employing contrastive learning and techniques to handle missing modalities. This approach is proving valuable across numerous applications, from medical diagnosis and e-commerce to assistive robotics and urban planning, by enabling more comprehensive and accurate analyses than unimodal methods.
Papers
MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation
Marco Bellagente, Manuel Brack, Hannah Teufel, Felix Friedrich, Björn Deiseroth, Constantin Eichenberg, Andrew Dai, Robert Baldock, Souradeep Nanda, Koen Oostermeijer, Andres Felipe Cruz-Salinas, Patrick Schramowski, Kristian Kersting, Samuel Weinbach
T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified Visual Modalities
Kangfu Mei, Mo Zhou, Vishal M. Patel