Multi Modality
Multimodality in machine learning focuses on integrating information from diverse data sources (e.g., text, images, audio, sensor data) to improve model performance and robustness. Current research emphasizes developing effective fusion strategies within various model architectures, including transformers and autoencoders, often employing contrastive learning and techniques to handle missing modalities. This approach is proving valuable across numerous applications, from medical diagnosis and e-commerce to assistive robotics and urban planning, by enabling more comprehensive and accurate analyses than unimodal methods.
Papers
Navigating to Success in Multi-Modal Human-Robot Collaboration: Analysis and Corpus Release
Stephanie M. Lukin, Kimberly A. Pollard, Claire Bonial, Taylor Hudson, Ron Arstein, Clare Voss, David Traum
EqDrive: Efficient Equivariant Motion Forecasting with Multi-Modality for Autonomous Driving
Yuping Wang, Jier Chen