Universal Image
Universal image embedding research aims to create single models capable of representing and processing images across diverse domains and tasks, overcoming the limitations of domain-specific models. Current efforts focus on developing robust and efficient embedding models, often leveraging large language models (LLMs) and contrastive learning frameworks, to achieve high performance on various downstream applications like image retrieval, segmentation, and generation. This pursuit of universality is significant because it promises more efficient and adaptable AI systems, impacting fields ranging from medical image analysis to large-scale visual search.
Papers
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang
Language Models are Universal Embedders
Xin Zhang, Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Min Zhang