DiNO Mix
DINO (DETR with Improved DeNoising Anchor Boxes) and its various extensions represent a family of self-supervised learning methods primarily focused on improving visual representation learning for various computer vision tasks. Current research emphasizes adapting DINO's powerful feature extraction capabilities to diverse applications, including medical image analysis, open-vocabulary object detection, and robot manipulation, often through techniques like feature mixing and adapter learning. This work is significant because it demonstrates the potential of self-supervised learning to address data scarcity issues in specialized domains and improve the efficiency and generalizability of computer vision models across a wide range of applications.
Papers
OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, Yaowei Wang, Xiangyuan Lan, Xiaodan Liang
Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search
Kirill Paramonov, Jia-Xing Zhong, Umberto Michieli, Jijoong Moon, Mete Ozay