Monocular Input

Monocular input, using a single camera to infer 3D information, is a core challenge in computer vision, aiming to reconstruct complex scenes and objects from limited visual data. Current research focuses on developing robust deep learning models, including transformers and convolutional neural networks, often incorporating additional sensor data like IMUs to improve accuracy and real-time performance. These advancements enable applications ranging from human avatar animation and hand tracking to 3D scene reconstruction for robotics and medical imaging (e.g., myopia screening), demonstrating the significance of monocular vision for diverse fields.

Papers