Object Centric Representation
Object-centric representation aims to model scenes as compositions of individual objects and their relationships, enabling more robust and generalizable AI systems. Current research focuses on developing unsupervised learning methods, often employing transformer networks, slot attention mechanisms, and generative models (like NeRFs) to learn these representations from various data modalities (images, videos, point clouds). This approach promises significant improvements in tasks requiring compositional understanding, such as robotics, visual question answering, and scene prediction, by moving beyond pixel-level processing to a more human-like understanding of the world. The resulting disentangled representations also enhance interpretability and facilitate zero-shot generalization across diverse domains.
Papers
SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation
Cheng-Chun Hsu, Bowen Wen, Jie Xu, Yashraj Narang, Xiaolong Wang, Yuke Zhu, Joydeep Biswas, Stan Birchfield
Improving Viewpoint-Independent Object-Centric Representations through Active Viewpoint Selection
Yinxuan Huang, Chengmin Gao, Bin Li, Xiangyang Xue
CarFormer: Self-Driving with Learned Object-Centric Representations
Shadi Hamdan, Fatma Güney
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan, Samuele Papa, Karl Henrik Johansson, Stefan Bauer, Andrea Dittadi