Paper ID: 2411.10028

MOT\_FCG++: Enhanced Representation of Motion and Appearance Features

Yanzhao Fang

The goal of multi-object tracking (MOT) is to detect and track all objects in a scene across frames, while maintaining a unique identity for each object. Most existing methods rely on the spatial motion features and appearance embedding features of the detected objects in consecutive frames. Effectively and robustly representing the spatial and appearance features of long trajectories has become a critical factor affecting the performance of MOT. We propose a novel approach for appearance and spatial feature representation, improving upon the clustering association method MOT\_FCG. For spatial motion features, we propose Diagonal Modulated GIoU, which more accurately represents the relationship between the position and shape of the objects. For appearance features, we utilize a dynamic appearance representation that incorporates confidence information, enabling the trajectory appearance features to be more robust and global. Based on the baseline model MOT\_FCG, we achieved 76.1 HOTA, 80.4 MOTA and 81.3 IDF1 on the MOT17 validation set, and also achieved competitive performance on the MOT20 and DanceTrack validation sets.

Submitted: Nov 15, 2024