Modality Specific Information
Modality-specific information refers to the unique details contained within individual data types (modalities) like images, text, or audio, which are often combined in multimodal learning tasks. Current research focuses on developing methods to effectively leverage both shared and unique information across modalities, often employing techniques like contrastive learning, disentangled representations, and dynamic fusion networks within transformer or other deep learning architectures. This work aims to improve the robustness and accuracy of multimodal systems by addressing challenges such as missing data and modality discrepancies, with significant implications for applications ranging from medical diagnosis and person re-identification to language understanding and action quality assessment.
Papers
CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation
Zhuoyan Luo, Yinghao Wu, Tianheng Cheng, Yong Liu, Yicheng Xiao, Hongfa Wang, Xiao-Ping Zhang, Yujiu Yang
Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval
Yiming Wu, Hangfei Li, Fangfang Wang, Yilong Zhang, Ronghua Liang