Text Based Person

Text-based person re-identification (ReID) focuses on identifying individuals from images using textual descriptions, bridging the gap between visual and linguistic data. Current research emphasizes improving the alignment of image and text features through techniques like bi-directional embedding alignment and contrastive learning, often employing deep learning models with various architectures including dual-encoders and multi-branch networks. This field is significant for advancing multimodal understanding and has practical applications in security, social media analysis, and video retrieval, particularly where precise visual identification is challenging or unavailable.

Papers