Multimodal Dataset
Multimodal datasets integrate data from diverse sources, such as text, images, audio, and sensor readings, to improve the performance of machine learning models on complex tasks. Current research focuses on developing and applying these datasets across various domains, including remote sensing, healthcare, and robotics, often employing transformer-based architectures and contrastive learning methods to effectively fuse information from different modalities. The availability of high-quality multimodal datasets is crucial for advancing research in artificial intelligence and enabling the development of more robust and accurate systems for a wide range of applications.
Papers
A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning
Fei Wang, Chengcheng Chen, Hongyu Chen, Yugang Chang, Weiming Zeng
Classifier-guided Gradient Modulation for Enhanced Multimodal Learning
Zirun Guo, Tao Jin, Jingyuan Chen, Zhou Zhao
MMCFND: Multimodal Multilingual Caption-aware Fake News Detection for Low-resource Indic Languages
Shubhi Bansal, Nishit Sushil Singh, Shahid Shafi Dar, Nagendra Kumar
Performance Evaluation of Deep Learning and Transformer Models Using Multimodal Data for Breast Cancer Classification
Sadam Hussain, Mansoor Ali, Usman Naseem, Beatriz Alejandra Bosques Palomo, Mario Alexis Monsivais Molina, Jorge Alberto Garza Abdala, Daly Betzabeth Avendano Avalos, Servando Cardona-Huerta, T. Aaron Gulliver, Jose Gerardo Tamez Pena
Maven: A Multimodal Foundation Model for Supernova Science
Gemma Zhang, Thomas Helfer, Alexander T. Gagliano, Siddharth Mishra-Sharma, V. Ashley Villar
RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio
Kian Behzad, Rojin Zandi, Elaheh Motamedi, Hojjat Salehinejad, Milad Siami