Multi Modal Feature
Multi-modal feature research focuses on effectively integrating information from diverse data sources (e.g., images, text, audio, sensor data) to improve the performance of machine learning models. Current research emphasizes efficient fusion techniques, often employing transformer-based architectures and graph neural networks, to overcome challenges like modality gaps and missing data. This field is significant for advancing various applications, including personalized recommendations, medical diagnosis, autonomous driving, and human-computer interaction, by enabling more robust and accurate systems. The development of modality-agnostic models, capable of handling incomplete or varying data modalities, is a key area of ongoing investigation.
Papers
MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt
Yuhao Wang, Xuehu Liu, Tianyu Yan, Yang Liu, Aihua Zheng, Pingping Zhang, Huchuan Lu
DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification
Yuhao Wang, Yang Liu, Aihua Zheng, Pingping Zhang
HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification
Shuyi Ouyang, Hongyi Wang, Ziwei Niu, Zhenjia Bai, Shiao Xie, Yingying Xu, Ruofeng Tong, Yen-Wei Chen, Lanfen Lin
Progressively Modality Freezing for Multi-Modal Entity Alignment
Yani Huang, Xuefeng Zhang, Richong Zhang, Junfan Chen, Jaein Kim