Text Based Person Search
Text-based person search (TBPS) aims to retrieve images of individuals from a large dataset using only a natural language description, bridging the gap between visual and textual representations. Current research focuses on improving the alignment of image and text features, often employing transformer-based architectures and incorporating techniques like attention mechanisms, masked autoencoders, and multi-modal learning to address challenges such as inter- and intra-identity variations and noisy data. These advancements enhance retrieval accuracy and efficiency, with applications in areas such as law enforcement, security, and multimedia retrieval. The field is also exploring semi-supervised and even unsupervised approaches to reduce reliance on large, manually annotated datasets.