Bird'S Eye View
Bird's-Eye-View (BEV) representation transforms multi-camera images into a top-down view, crucial for autonomous driving and robotics by providing a unified, geometrically-structured scene understanding. Current research focuses on improving BEV generation accuracy and robustness using transformer-based architectures, often incorporating multimodal sensor fusion (camera, LiDAR, radar) and advanced techniques like masked attention and Gaussian splatting to enhance feature representation and handle challenges like occlusion and domain adaptation. This work is significant for advancing autonomous systems by enabling more reliable perception, particularly in complex or challenging environments, and improving the performance of downstream tasks such as object detection, mapping, and trajectory prediction.
Papers
VoxelFormer: Bird's-Eye-View Feature Generation based on Dual-view Attention for Multi-view 3D Object Detection
Zhuoling Li, Chuanrui Zhang, Wei-Chiu Ma, Yipin Zhou, Linyan Huang, Haoqian Wang, SerNam Lim, Hengshuang Zhao
Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Zhuofan Zong, Dongzhi Jiang, Guanglu Song, Zeyue Xue, Jingyong Su, Hongsheng Li, Yu Liu