Video Transformer
Video transformers are deep learning models designed to process video data by leveraging the attention mechanisms of transformer architectures, aiming to improve video understanding tasks such as action recognition, segmentation, and generation. Current research focuses on enhancing efficiency, generalization across domains and datasets, and incorporating multimodal information (e.g., audio, pose) to improve accuracy and robustness. These advancements have significant implications for various applications, including healthcare (remote physiological measurement), robotics (manipulation), and video editing (inpainting, generation), by enabling more accurate and efficient analysis and manipulation of video content.
Papers
December 1, 2024
November 7, 2024
October 31, 2024
July 17, 2024
June 25, 2024
June 19, 2024
March 24, 2024
March 20, 2024
February 20, 2024
January 26, 2024
January 19, 2024
January 8, 2024
December 4, 2023
November 30, 2023
October 18, 2023
October 9, 2023
October 3, 2023
September 7, 2023
July 9, 2023