Paper ID: 2401.06960

Transformer for Object Re-Identification: A Survey

Mang Ye, Shuoyi Chen, Chenyue Li, Wei-Shi Zheng, David Crandall, Bo Du

Object Re-identification (Re-ID) aims to identify specific objects across different times and scenes, which is a widely researched task in computer vision. For a prolonged period, this field has been predominantly driven by deep learning technology based on convolutional neural networks. In recent years, the emergence of Vision Transformers has spurred a growing number of studies delving deeper into Transformer-based Re-ID, continuously breaking performance records and witnessing significant progress in the Re-ID field. Offering a powerful, flexible, and unified solution, Transformers cater to a wide array of Re-ID tasks with unparalleled efficacy. This paper provides a comprehensive review and in-depth analysis of the Transformer-based Re-ID. In categorizing existing works into Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios, we thoroughly elucidate the advantages demonstrated by the Transformer in addressing a multitude of challenges across these domains. Considering the trending unsupervised Re-ID, we propose a new Transformer baseline, UntransReID, achieving state-of-the-art performance on both single/cross modal tasks. For the under-explored animal Re-ID, we devise a standardized experimental benchmark and conduct extensive experiments to explore the applicability of Transformer for this task and facilitate future research. Finally, we discuss some important yet under-investigated open issues in the large foundation model era, we believe it will serve as a new handbook for researchers in this field. A periodically updated website will be available at this https URL

Submitted: Jan 13, 2024