Open Vocabulary Semantic Segmentation

Open-vocabulary semantic segmentation (OVSS) aims to assign semantic labels to image pixels without requiring pre-defined categories, enabling the recognition of objects not seen during training. Current research focuses on adapting vision-language models like CLIP, often in conjunction with other foundation models (e.g., SAM, DINO), to achieve this, employing techniques such as multi-resolution processing, pseudo-mask generation, and contrastive learning to improve accuracy and efficiency. OVSS holds significant promise for advancing various applications, including autonomous driving, remote sensing, and medical image analysis, by enabling more flexible and robust image understanding.

Papers

July 17, 2024

ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Mengcheng Lan, Chaofeng Chen, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang
Semantic Segmentation Vision Language Model Open Vocabulary Semantic Segmentation Segmentation Quality CLIP Representation

July 11, 2024

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
Tong Shao, Zhuotao Tian, Hang Zhao, Jingyong Su
Full Potential Single CLIP Segmentation Accuracy Open Vocabulary Semantic Segmentation Segmentation Benchmark Free Semantic Segmentation Patch Correlation

July 6, 2024

A Study of Test-time Contrastive Concepts for Open-world, Open-vocabulary Semantic Segmentation
Monika Wysoczańska, Antonin Vobecky, Amaia Cardiel, Tomasz Trzciński, Renaud Marlet, Andrei Bursuc, Oriane Siméoni
Study Feature Open World Image Text Pair Natural Language Query Open Vocabulary Semantic Segmentation Image Region Text Contrastive Learning

July 3, 2024

A Unified Framework for 3D Scene Understanding
Wei Xu, Chunsheng Shi, Sifan Tu, Xin Zhou, Dingkang Liang, Xiang Bai
Unified Framework 3D Scene 3D Segmentation 3D Scene Understanding Open Vocabulary Semantic Segmentation

June 14, 2024

Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
Xiangheng Shan, Dongyue Wu, Guilin Zhu, Yuanjie Shao, Nong Sang, Changxin Gao
Semantic Segmentation Open Vocabulary Semantic Segmentation Semantic Mask

June 13, 2024

Auto-Vocabulary Segmentation for LiDAR Points
Weijie Wei, Osman Ülger, Fatemeh Karimi Nejadasl, Theo Gevers, Martin R. Oswald
Point Cloud Open Ended Open Vocabulary Semantic Segmentation LiDAR Point

May 29, 2024

Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation
Zelin Peng, Zhengqin Xu, Zhilin Zeng, Yaoming Wang, Wei Shen
Semantic Segmentation Parameter Efficient Fine Tuning Vision Language Foundation Model Open Vocabulary Semantic Segmentation HyperSpherical Energy

April 12, 2024

Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation
Sina Hajimiri, Ismail Ben Ayed, Jose Dolz
Semantic Segmentation Zero Shot Open Vocabulary Semantic Segmentation Thy Neighbor Semantic Segmentation Benchmark Pay Attention CLIP Level

April 9, 2024

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
Luca Barsellotti, Roberto Amoroso, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
Semantic Segmentation Action Free Offline Open Vocabulary Semantic Segmentation Image Caption Pair Discriminative Region

March 30, 2024

Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation
Yuan Wang, Rui Sun, Naisong Luo, Yuwen Pan, Tianzhu Zhang
Semantic Segmentation Foundation Model New Perspective Open Vocabulary Semantic Segmentation Image Matching Modal Similarity Cross Modal Matching

March 17, 2024

March 6, 2024

Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision
Yajie Liu, Pu Ge, Qingjie Liu, Di Huang
Open Vocabulary Semantic Segmentation Text Supervision Granularity Alignment Fine Grained Cross Modal Alignment Multi Grained Contrastive

February 21, 2024

Generalizable Semantic Vision Query Generation for Zero-shot Panoptic and Semantic Segmentation
Jialei Chen, Daisuke Deguchi, Chenkai Zhang, Hiroshi Murase
Semantic Segmentation Open Vocabulary Semantic Segmentation Semantic Image

January 22, 2024

Exploring Simple Open-Vocabulary Semantic Segmentation
Zihang Lai
Semantic Segmentation Model Semantic Label Open Vocabulary Semantic Segmentation Pixel Level Alignment Image Text Datasets

December 19, 2023

CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation
Monika Wysoczańska, Oriane Siméoni, Michaël Ramamonjisoa, Andrei Bursuc, Tomasz Trzciński, Patrick Pérez
Semantic Segmentation Single CLIP Open Vocabulary Semantic Segmentation DiNO Mix Dense Vision Task

December 7, 2023

Auto-Vocabulary Semantic Segmentation
Osman Ülger, Maksymilian Kulicki, Yuki Asano, Martin R. Oswald
Semantic Segmentation Open Ended Open Vocabulary Semantic Segmentation Open Vocabulary Segmentation

November 28, 2023

Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
Jiayun Luo, Siddhesh Khandelwal, Leonid Sigal, Boyang Li
Vision Language Model Structured Dropout Open Vocabulary Semantic Segmentation Segmentation Annotation

November 27, 2023

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Bin Xie, Jiale Cao, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang
Semantic Segmentation Pre Trained Vision Language Model Encoder Decoder Open Vocabulary Semantic Segmentation GIT Net Image to Text Mapping

November 19, 2023

Open-Vocabulary Camouflaged Object Segmentation
Youwei Pang, Xiaoqi Zhao, Jiaming Zuo, Lihe Zhang, Huchuan Lu
Open World Object Segmentation Open Vocabulary Semantic Segmentation Open Vocabulary Dense Prediction

Open Vocabulary Semantic Segmentation

Papers

ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

A Study of Test-time Contrastive Concepts for Open-world, Open-vocabulary Semantic Segmentation

A Unified Framework for 3D Scene Understanding

Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

Auto-Vocabulary Segmentation for LiDAR Points

Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation

Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation

Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation

TAG: Guidance-free Open-Vocabulary Semantic Segmentation

MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation

Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision

Generalizable Semantic Vision Query Generation for Zero-shot Panoptic and Semantic Segmentation

Exploring Simple Open-Vocabulary Semantic Segmentation

CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation

Auto-Vocabulary Semantic Segmentation

Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Open-Vocabulary Camouflaged Object Segmentation