Multimodal Approach
Multimodal approaches in machine learning integrate data from multiple sources (e.g., text, images, audio) to improve model performance and understanding compared to using single modalities. Current research focuses on developing and applying multimodal models, often leveraging transformer architectures like BERT and ResNet, along with techniques like attention mechanisms and fusion strategies (early, mid, late fusion) to effectively combine diverse data types. This methodology is proving valuable across numerous fields, including healthcare (e.g., disease diagnosis, medical question summarization), e-commerce (e.g., product recommendation), and safety (e.g., autonomous driving, road surface detection), by providing more robust and nuanced insights than unimodal methods.
Papers
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Dominik Wagner, Alexander Churchill, Siddharth Sigtia, Panayiotis Georgiou, Matt Mirsamadi, Aarshee Mishra, Erik Marchi
MOGAM: A Multimodal Object-oriented Graph Attention Model for Depression Detection
Junyeop Cha, Seoyun Kim, Dongjae Kim, Eunil Park