Multi Modal Data
Multi-modal data analysis focuses on integrating information from diverse sources, such as images, text, audio, and sensor data, to achieve more comprehensive and accurate insights than using any single modality alone. Current research emphasizes developing robust models, often based on transformer architectures and contrastive learning, that can effectively fuse these disparate data types, handle missing data, and address issues like noisy labels and modality mismatches. This field is crucial for advancing numerous applications, including medical diagnosis, urban planning, materials science, and traffic prediction, by enabling more sophisticated and reliable analyses of complex systems.
Papers
Multi-modal Data based Semi-Supervised Learning for Vehicle Positioning
Ouwen Huan, Yang Yang, Tao Luo, Mingzhe Chen
BSM: Small but Powerful Biological Sequence Model for Genes and Proteins
Weixi Xiang, Xueting Han, Xiujuan Chai, Jing Bai
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities
Zhifei Xie, Changqiao Wu