Spatiotemporal Transformer
Spatiotemporal transformers are deep learning models designed to analyze and process data with both spatial and temporal dependencies, aiming to improve prediction and understanding of dynamic systems. Current research focuses on applying these models to various tasks, including video analysis (action detection, video generation, inpainting), environmental monitoring (data imputation), and financial forecasting, often employing architectures based on vision transformers and incorporating mechanisms like attention and recurrent units to capture complex spatiotemporal relationships. This approach offers significant advantages in handling high-dimensional, time-varying data, leading to improved performance in diverse fields ranging from healthcare to autonomous driving.
Papers
Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet
Shihao Zou, Yuanlu Xu, Chao Li, Lingni Ma, Li Cheng, Minh Vo
SCouT: Synthetic Counterfactuals via Spatiotemporal Transformers for Actionable Healthcare
Bhishma Dedhia, Roshini Balasubramanian, Niraj K. Jha