Object Level
Object-level understanding in computer vision aims to represent and reason about individual objects within scenes, moving beyond simple object detection to encompass their properties, relationships, and interactions. Current research heavily utilizes transformer-based architectures, often incorporating multi-modal learning (combining visual and textual data) and leveraging techniques like knowledge distillation and contrastive learning to improve model performance and generalization. This focus on object-centric representation is crucial for advancing applications such as autonomous driving, robotics, and image understanding, enabling more robust and context-aware systems.
Papers
Grouped Discrete Representation Guides Object-Centric Learning
Rongzhen Zhao, Vivienne Wang, Juho Kannala, Joni Pajarinen
Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving
Ran Tian, Boyi Li, Xinshuo Weng, Yuxiao Chen, Edward Schmerling, Yue Wang, Boris Ivanovic, Marco Pavone