Multi Modal Feature
Multi-modal feature research focuses on effectively integrating information from diverse data sources (e.g., images, text, audio, sensor data) to improve the performance of machine learning models. Current research emphasizes efficient fusion techniques, often employing transformer-based architectures and graph neural networks, to overcome challenges like modality gaps and missing data. This field is significant for advancing various applications, including personalized recommendations, medical diagnosis, autonomous driving, and human-computer interaction, by enabling more robust and accurate systems. The development of modality-agnostic models, capable of handling incomplete or varying data modalities, is a key area of ongoing investigation.
Papers
HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification
Shuyi Ouyang, Hongyi Wang, Ziwei Niu, Zhenjia Bai, Shiao Xie, Yingying Xu, Ruofeng Tong, Yen-Wei Chen, Lanfen Lin
Progressively Modality Freezing for Multi-Modal Entity Alignment
Yani Huang, Xuefeng Zhang, Richong Zhang, Junfan Chen, Jaein Kim
EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging
Danli Shi, Weiyi Zhang, Xiaolan Chen, Yexin Liu, Jiancheng Yang, Siyu Huang, Yih Chung Tham, Yingfeng Zheng, Mingguang He
Fuse & Calibrate: A bi-directional Vision-Language Guided Framework for Referring Image Segmentation
Yichen Yan, Xingjian He, Sihan Chen, Shichen Lu, Jing Liu