Vision Language
Vision-language research focuses on developing models that understand and integrate visual and textual information, aiming to bridge the gap between computer vision and natural language processing. Current research emphasizes improving model robustness against adversarial attacks, enhancing efficiency through techniques like token pruning and parameter-efficient fine-tuning, and addressing challenges in handling noisy data and complex reasoning tasks. This field is significant because it enables advancements in various applications, including image captioning, visual question answering, and medical image analysis, ultimately impacting fields ranging from healthcare to autonomous driving.
Papers
Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?
Fei Wang, Liang Ding, Jun Rao, Ye Liu, Li Shen, Changxing Ding
PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning
Mengya Han, Heliang Zheng, Chaoyue Wang, Yong Luo, Han Hu, Jing Zhang, Yonggang Wen
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval
Yuan Yuan, Yang Zhan, Zhitong Xiong