Multi Modal Feature
Multi-modal feature research focuses on effectively integrating information from diverse data sources (e.g., images, text, audio, sensor data) to improve the performance of machine learning models. Current research emphasizes efficient fusion techniques, often employing transformer-based architectures and graph neural networks, to overcome challenges like modality gaps and missing data. This field is significant for advancing various applications, including personalized recommendations, medical diagnosis, autonomous driving, and human-computer interaction, by enabling more robust and accurate systems. The development of modality-agnostic models, capable of handling incomplete or varying data modalities, is a key area of ongoing investigation.
Papers
EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging
Danli Shi, Weiyi Zhang, Xiaolan Chen, Yexin Liu, Jiancheng Yang, Siyu Huang, Yih Chung Tham, Yingfeng Zheng, Mingguang He
Fuse & Calibrate: A bi-directional Vision-Language Guided Framework for Referring Image Segmentation
Yichen Yan, Xingjian He, Sihan Chen, Shichen Lu, Jing Liu
MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal Cancer
Chengyu Wu, Chengkai Wang, Yaqi Wang, Huiyu Zhou, Yatao Zhang, Qifeng Wang, Shuai Wang
ReconBoost: Boosting Can Achieve Modality Reconcilement
Cong Hua, Qianqian Xu, Shilong Bao, Zhiyong Yang, Qingming Huang