Multi Modal Model
Multi-modal models aim to integrate and process information from multiple data sources (e.g., text, images, audio) to achieve a more comprehensive understanding than unimodal approaches. Current research focuses on improving model robustness, efficiency, and generalization across diverse tasks, often employing transformer-based architectures and techniques like self-supervised learning, fine-tuning, and modality fusion strategies. These advancements are significant for various applications, including assistive robotics, medical image analysis, and improved large language model capabilities, by enabling more accurate and nuanced interpretations of complex real-world data.
92papers
Papers
May 20, 2025
AKRMap: Adaptive Kernel Regression for Trustworthy Visualization of Cross-Modal Embeddings
Yilin Ye, Junchao Huang, Xingchen Zeng, Jiazhi Xia, Wei ZengThe Hong Kong University of Science and Technology (Guangzhou)●The Hong Kong University of Science and Technology●The Chinese University...+2Towards a Foundation Model for Communication Systems
Davide Buffelli, Sowmen Das, Yu-Wei Lin, Sattar Vakili, Chien-Yi Wang, Masoud Attarifar, Pritthijit Nath, Da-shan ShiuLondon●Hsinchu●Cambourne●University of Cambridge●Taipei
February 11, 2025
January 13, 2025
December 24, 2024