Human Centric Visual
Human-centric visual research focuses on developing computer vision systems that understand and interpret images from a human perspective, prioritizing the detection and analysis of human actions, interactions, and contextual information. Current research emphasizes the use of transformer-based models, often incorporating vision-language models and autoregressive techniques, to generate and utilize human-centric visual cues such as body language and environmental context for tasks like human-object interaction detection and 360-degree image generation. Addressing biases in existing datasets, particularly concerning geographic representation, is also a critical area of investigation, aiming to improve the fairness and generalizability of these models. This work has significant implications for improving the accuracy and robustness of computer vision systems across diverse applications, including virtual reality and human-computer interaction.