Different Modality
Multimodal learning focuses on integrating information from diverse data sources (e.g., text, images, audio) to improve model performance and robustness. Current research emphasizes efficient fusion techniques, addressing challenges like missing modalities through methods such as contrastive learning, modality-aware adaptation, and progressive alignment using lightweight architectures like OneEncoder. This field is significant for advancing AI capabilities in various applications, including medical diagnosis, visual question answering, and human activity recognition, by enabling more comprehensive and reliable analysis of complex data.
Papers
Revisited Large Language Model for Time Series Analysis through Modality Alignment
Liangwei Nathan Zheng, Chang George Dong, Wei Emma Zhang, Lin Yue, Miao Xu, Olaf Maennel, Weitong Chen
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
Lichang Chen, Hexiang Hu, Mingda Zhang, Yiwen Chen, Zifeng Wang, Yandong Li, Pranav Shyam, Tianyi Zhou, Heng Huang, Ming-Hsuan Yang, Boqing Gong
FedMAC: Tackling Partial-Modality Missing in Federated Learning with Cross-Modal Aggregation and Contrastive Regularization
Manh Duong Nguyen, Trung Thanh Nguyen, Huy Hieu Pham, Trong Nghia Hoang, Phi Le Nguyen, Thanh Trung Huynh
Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting and Reasoning Tasks
Grant Wardle, Teo Susnjak