Human Centric Scene Understanding
Human-centric scene understanding aims to enable computers to interpret visual scenes by focusing on human actions, interactions, and relationships with objects. Current research heavily emphasizes improving the accuracy and robustness of human-object interaction (HOI) detection, often leveraging large language models (LLMs) for enhanced reasoning capabilities and employing techniques like unsupervised learning from synthetic data to address data scarcity. These advancements are crucial for developing more sophisticated robots and autonomous systems, improving accessibility for people with disabilities, and creating more intuitive human-computer interfaces. The development of large-scale, multi-modal datasets is also a key focus, facilitating the training and evaluation of these increasingly complex models.