DiNO Mix
DINO (DETR with Improved DeNoising Anchor Boxes) and its various extensions represent a family of self-supervised learning methods primarily focused on improving visual representation learning for various computer vision tasks. Current research emphasizes adapting DINO's powerful feature extraction capabilities to diverse applications, including medical image analysis, open-vocabulary object detection, and robot manipulation, often through techniques like feature mixing and adapter learning. This work is significant because it demonstrates the potential of self-supervised learning to address data scarcity issues in specialized domains and improve the efficiency and generalizability of computer vision models across a wide range of applications.
Papers
CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections
Mohamed Fazli Imam, Rufael Fedaku Marew, Jameel Hassan, Mustansar Fiaz, Alham Fikri Aji, Hisham Cholakkal
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Luca Barsellotti, Lorenzo Bianchi, Nicola Messina, Fabio Carrara, Marcella Cornia, Lorenzo Baraldi, Fabrizio Falchi, Rita Cucchiara
OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, Yaowei Wang, Xiangyuan Lan, Xiaodan Liang
Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search
Kirill Paramonov, Jia-Xing Zhong, Umberto Michieli, Jijoong Moon, Mete Ozay