Cross Modality Matching

Cross-modality matching focuses on aligning and comparing data from different sensory modalities (e.g., images and text, visible and infrared light, images and point clouds). Current research emphasizes developing robust algorithms, often leveraging contrastive learning, optimal transport, and pre-trained models like CLIP, to bridge the "modality gap" and improve cross-modal matching accuracy. This work is crucial for applications ranging from person re-identification and medical image analysis to zero-shot learning and image retrieval, enabling more powerful and versatile AI systems. Significant advancements are being made through techniques like generating homogeneous modalities and incorporating multi-granularity feature extraction.

8papers

Papers

January 23, 2025

From Cross-Modal to Mixed-Modal Visible-Infrared Re-Identification
Multi Modality Cross Modality Matching Visible Infrared Person Re Identification Multi Modal Camera Modality Modal Fine Tuning

January 13, 2025

MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training
Cross Modality Matching Modality Registration Synthetic Multi Modal Dataset

November 7, 2024

FreeCap: Hybrid Calibration-Free Motion Capture in Open Environments
Human Motion Motion Capture Markerless Motion Capture Cross Modality Matching Soft Cap Open Environment

July 17, 2024

Mutual Information Guided Optimal Transport for Unsupervised Visible-Infrared Person Re-identification
Cross Modality Matching Visible Infrared Person Re Identification Cross Modal Information Unsupervised Setting Optimal Transport Pedestrian Retrieval

April 27, 2024

Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification
Cross Modality Matching CLIP Enhanced Blockwise Classification

January 11, 2024

CLIP-Driven Semantic Discovery Network for Visible-Infrared Person Re-Identification
Visible Infrared Person Re Identification Modality Gap Visual Representation Cross Modality Matching Semantic Discovery Modality Representation

December 16, 2023

Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based Image Retrieval
Knowledge Alignment Cross Modality Matching Sketch Based Image Retrieval Zero Shot Sketch Based Image

November 25, 2023

UAE: Universal Anatomical Embedding on Multi-modality Medical Images
Cross Modality Matching Medical Image Landmark Detection

October 5, 2023

FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators
Cloud Registration Image to Point Cloud Registration Cross Modal Pre Trained Diffusion Model Monocular Depth Cross Modality Matching

May 17, 2023

Probing the Role of Positional Information in Vision-Language Models
Vision Language Model Integral Role Visual Question Answering Object Localization Positional Information Cross Modality Matching

April 11, 2022

Towards Homogeneous Modality Learning and Multi-Granularity Information Exploration for Visible-Infrared Person Re-Identification
Visible Infrared Person Re Identification Modality Alignment Cross Modality Matching

Cross Modality Matching

Papers

From Cross-Modal to Mixed-Modal Visible-Infrared Re-Identification

MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training

FreeCap: Hybrid Calibration-Free Motion Capture in Open Environments

Mutual Information Guided Optimal Transport for Unsupervised Visible-Infrared Person Re-identification

Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

CLIP-Driven Semantic Discovery Network for Visible-Infrared Person Re-Identification

Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based Image Retrieval

UAE: Universal Anatomical Embedding on Multi-modality Medical Images

FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators

Probing the Role of Positional Information in Vision-Language Models

Towards Homogeneous Modality Learning and Multi-Granularity Information Exploration for Visible-Infrared Person Re-Identification