Scene Understanding
Scene understanding in computer vision aims to enable machines to interpret and reason about visual scenes, mirroring human perception. Current research heavily focuses on integrating multiple data modalities (e.g., audio, depth, video) and leveraging advanced architectures like transformers and neural radiance fields to achieve robust object detection, segmentation, and scene graph generation, often within specific application domains such as autonomous driving and robotics. These advancements are crucial for developing more intelligent and reliable systems in various fields, from autonomous vehicles navigating complex environments to robots interacting with human-centered spaces. Benchmark datasets and standardized evaluation metrics are also actively being developed to facilitate progress and ensure reliable comparisons between different approaches.
Papers
LLMR: Real-time Prompting of Interactive Worlds using Large Language Models
Fernanda De La Torre, Cathy Mengying Fang, Han Huang, Andrzej Banburski-Fahey, Judith Amores Fernandez, Jaron Lanier
SANPO: A Scene Understanding, Accessibility and Human Navigation Dataset
Sagar M. Waghmare, Kimberly Wilber, Dave Hawkey, Xuan Yang, Matthew Wilson, Stephanie Debats, Cattalyya Nuengsigkapian, Astuti Sharma, Lars Pandikow, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko
Survey of Action Recognition, Spotting and Spatio-Temporal Localization in Soccer -- Current Trends and Research Perspectives
Karolina Seweryn, Anna Wróblewska, Szymon Łukasik
PAg-NeRF: Towards fast and efficient end-to-end panoptic 3D representations for agricultural robotics
Claus Smitt, Michael Halstead, Patrick Zimmer, Thomas Läbe, Esra Guclu, Cyrill Stachniss, Chris McCool
Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving
Ali Keysan, Andreas Look, Eitan Kosman, Gonca Gürsun, Jörg Wagner, Yu Yao, Barbara Rakitsch