Video Transformer
Video transformers are deep learning models designed to process video data by leveraging the attention mechanisms of transformer architectures, aiming to improve video understanding tasks such as action recognition, segmentation, and generation. Current research focuses on enhancing efficiency, generalization across domains and datasets, and incorporating multimodal information (e.g., audio, pose) to improve accuracy and robustness. These advancements have significant implications for various applications, including healthcare (remote physiological measurement), robotics (manipulation), and video editing (inpainting, generation), by enabling more accurate and efficient analysis and manipulation of video content.
Papers
May 15, 2023
May 4, 2023
April 24, 2023
March 17, 2023
March 15, 2023
December 8, 2022
November 11, 2022
October 14, 2022
September 19, 2022
September 15, 2022
August 12, 2022
August 6, 2022
August 2, 2022
July 26, 2022
July 22, 2022
July 19, 2022
July 12, 2022
March 31, 2022
March 10, 2022
March 3, 2022