Multimodal Dataset
Multimodal datasets integrate data from diverse sources, such as text, images, audio, and sensor readings, to improve the performance of machine learning models on complex tasks. Current research focuses on developing and applying these datasets across various domains, including remote sensing, healthcare, and robotics, often employing transformer-based architectures and contrastive learning methods to effectively fuse information from different modalities. The availability of high-quality multimodal datasets is crucial for advancing research in artificial intelligence and enabling the development of more robust and accurate systems for a wide range of applications.
Papers
A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction
Naval Kishore Mehta, Arvind, Himanshu Kumar, Abeer Banerjee, Sumeet Saurav, Sanjay Singh
Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models
Sofia Jamil, Bollampalli Areen Reddy, Raghvendra Kumar, Sriparna Saha, K J Joseph, Koustava Goswami
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities
Guillaume Astruc, Nicolas Gonthier, Clement Mallet, Loic Landrieu
MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data
Vageesh Saxena, Benjamin Bashpole, Gijs Van Dijck, Gerasimos Spanakis