Multimodal Deep Learning
Multimodal deep learning integrates data from diverse sources (e.g., images, text, audio) to build more robust and accurate predictive models than those using single data types. Current research emphasizes efficient fusion strategies (intermediate fusion being a prominent example), exploring various neural network architectures like CNNs, RNNs, and transformers, often incorporating attention mechanisms to weigh the importance of different modalities. This approach is significantly impacting various fields, including healthcare (improving diagnostics and prognostics), autonomous driving (sensor fusion), and scientific discovery (analyzing complex datasets), by enabling more comprehensive and insightful analyses.
Papers
Identification of Cognitive Workload during Surgical Tasks with Multimodal Deep Learning
Kaizhe Jin, Adrian Rubio-Solis, Ravi Naik, Tochukwu Onyeogulu, Amirul Islam, Salman Khan, Izzeddin Teeti, James Kinross, Daniel R Leff, Fabio Cuzzolin, George Mylonas
TMSS: An End-to-End Transformer-based Multimodal Network for Segmentation and Survival Prediction
Numan Saeed, Ikboljon Sobirov, Roba Al Majzoub, Mohammad Yaqub