Multimodal Phenomenon
Multimodal research focuses on developing artificial intelligence systems that can effectively process and integrate information from multiple data sources (e.g., text, images, audio, video). Current efforts concentrate on improving the robustness and accuracy of multimodal large language models (MLLMs) through techniques like chain-of-thought prompting, contrastive learning, and multimodal masked autoencoders, often addressing challenges such as hallucination mitigation and efficient resource utilization on edge devices. This field is significant because it enables more comprehensive and nuanced understanding of complex phenomena, with applications ranging from improved medical diagnosis and drug discovery to enhanced human-computer interaction and more effective educational tools. The development of robust benchmarks and open-source tools is also a key area of focus to facilitate collaborative research and development.
Papers
Learning Multimodal Confidence for Intention Recognition in Human-Robot Interaction
Xiyuan Zhao, Huijun Li, Tianyuan Miao, Xianyi Zhu, Zhikai Wei, Aiguo Song
Integrating Medical Imaging and Clinical Reports Using Multimodal Deep Learning for Advanced Disease Analysis
Ziyan Yao, Fei Lin, Sheng Chai, Weijie He, Lu Dai, Xinghui Fei
PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment
Shezheng Song, Shasha Li, Shan Zhao, Chengyu Wang, Xiaopeng Li, Jie Yu, Qian Wan, Jun Ma, Tianwei Yan, Wentao Ma, Xiaoguang Mao
Integrating Large Language Models with Multimodal Virtual Reality Interfaces to Support Collaborative Human-Robot Construction Work
Somin Park, Carol C. Menassa, Vineet R. Kamat
HiMAL: A Multimodal Hierarchical Multi-task Auxiliary Learning framework for predicting and explaining Alzheimer disease progression
Sayantan Kumar, Sean Yu, Andrew Michelson, Thomas Kannampallil, Philip Payne