Referring Expression Comprehension

Referring expression comprehension (REC) focuses on teaching computers to locate objects in images based on natural language descriptions, bridging the gap between visual and linguistic understanding. Current research emphasizes improving the accuracy and efficiency of REC models, exploring various architectures including transformers and graph-based methods, and addressing challenges like handling complex expressions, noisy datasets, and computational cost through techniques such as parameter-efficient fine-tuning and dynamic reasoning. Advances in REC are crucial for developing robust multimodal AI systems with applications in robotics, image retrieval, and human-computer interaction.

Papers