Scene Representation Transformer

Scene Representation Transformers (SRTs) aim to create efficient and accurate 3D scene representations from 2D images or videos, enabling novel view synthesis and other downstream tasks. Current research focuses on improving SRT architectures, such as incorporating relative pose information for scalability, leveraging external knowledge bases for enhanced scene understanding, and developing methods for handling dynamic scenes and unposed imagery. These advancements are significant for applications like autonomous driving, 3D scene generation, and visual grounding, offering improvements in speed, accuracy, and data efficiency compared to traditional methods.

Papers