Multimodal Model
Multimodal models integrate information from multiple sources like text, images, audio, and video to achieve a more comprehensive understanding than unimodal approaches. Current research focuses on improving model interpretability, addressing biases, enhancing robustness against adversarial attacks and missing data, and developing efficient architectures like transformers and state-space models for various tasks including image captioning, question answering, and sentiment analysis. These advancements are significant for applications ranging from healthcare and robotics to more general-purpose AI systems, driving progress in both fundamental understanding and practical deployment of AI.
Papers
Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications
Monica Riedler, Stefan Langer
Dreaming Out Loud: A Self-Synthesis Approach For Training Vision-Language Models With Developmentally Plausible Data
Badr AlKhamissi, Yingtian Tang, Abdülkadir Gökce, Johannes Mehrer, Martin Schrimpf
Turn-by-Turn Indoor Navigation for the Visually Impaired
Santosh Srinivasaiah, Sai Kumar Nekkanti, Rohith Reddy Nedhunuri
A Multimodal Approach For Endoscopic VCE Image Classification Using BiomedCLIP-PubMedBERT
Nagarajan Ganapathy, Podakanti Satyajith Chary, Teja Venkata Ramana Kumar Pithani, Pavan Kavati, Arun Kumar S
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Hongming Zhang, Tianqing Fang, Zhenzhong Lan, Dong Yu
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
Sara Ghaboura, Ahmed Heakl, Omkar Thawakar, Ali Alharthi, Ines Riahi, Abduljalil Saif, Jorma Laaksonen, Fahad S. Khan, Salman Khan, Rao M. Anwer
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
David Ortiz-Perez, Manuel Benavent-Lledo, Jose Garcia-Rodriguez, David Tomás, M. Flores Vizcaya-Moreno
A Survey of Multimodal Sarcasm Detection
Shafkat Farabi, Tharindu Ranasinghe, Diptesh Kanojia, Yu Kong, Marcos Zampieri