Multi Modal Object Tracking

Multi-modal object tracking (MMOT) aims to robustly locate objects in video sequences by integrating information from multiple sensor sources, such as visual, depth, thermal, and even language data. Current research emphasizes developing efficient and generalizable models, often employing transformer-based architectures or adapting pre-trained models via techniques like prompt tuning and self-distillation to handle diverse modalities and improve performance in challenging conditions. This field is crucial for advancing applications like autonomous driving and surveillance, where reliable object tracking across various sensor limitations is paramount.

Papers