Multimodal Deep Learning
Multimodal deep learning integrates data from diverse sources (e.g., images, text, audio) to build more robust and accurate predictive models than those using single data types. Current research emphasizes efficient fusion strategies (intermediate fusion being a prominent example), exploring various neural network architectures like CNNs, RNNs, and transformers, often incorporating attention mechanisms to weigh the importance of different modalities. This approach is significantly impacting various fields, including healthcare (improving diagnostics and prognostics), autonomous driving (sensor fusion), and scientific discovery (analyzing complex datasets), by enabling more comprehensive and insightful analyses.
48papers
Papers - Page 3
November 4, 2022
September 12, 2022
Identification of Cognitive Workload during Surgical Tasks with Multimodal Deep Learning
Kaizhe Jin, Adrian Rubio-Solis, Ravi Naik, Tochukwu Onyeogulu, Amirul Islam, Salman Khan, Izzeddin Teeti, James Kinross, Daniel R Leff+2TMSS: An End-to-End Transformer-based Multimodal Network for Segmentation and Survival Prediction
Numan Saeed, Ikboljon Sobirov, Roba Al Majzoub, Mohammad Yaqub
August 23, 2022
February 18, 2022