Visual Relation
Visual relation understanding in computer vision aims to enable machines to comprehend the relationships between objects within images and videos, mirroring human visual perception. Current research focuses on improving the accuracy and efficiency of visual relation detection and generation using various deep learning architectures, including transformers, graph neural networks, and diffusion models, often incorporating techniques like active perception and knowledge graphs to enhance performance. This field is crucial for advancing artificial intelligence, with applications ranging from scene understanding and image captioning to more complex tasks like robotic manipulation and medical image analysis.
Papers
ReVersion: Diffusion-Based Relation Inversion from Images
Ziqi Huang, Tianxing Wu, Yuming Jiang, Kelvin C.K. Chan, Ziwei Liu
Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning
Wenqing Wang, Yawei Luo, Zhiqing Chen, Tao Jiang, Lei Chen, Yi Yang, Jun Xiao