Human Object Pair
Human-object interaction (HOI) detection focuses on identifying pairs of humans and objects within images and classifying their interactions. Current research emphasizes improving the accuracy and efficiency of HOI detection, particularly for rare or unseen interactions, using techniques like transformer-based models, vision-language models (e.g., CLIP), and graph convolutional networks. This field is crucial for advancing high-level scene understanding and visual reasoning, with applications ranging from robotics and assistive technologies to content analysis and human-computer interaction. Significant effort is also being dedicated to reducing reliance on extensive manual annotations through weakly-supervised and zero-shot learning approaches.