Video Transformer
Video transformers are deep learning models designed to process video data by leveraging the attention mechanisms of transformer architectures, aiming to improve video understanding tasks such as action recognition, segmentation, and generation. Current research focuses on enhancing efficiency, generalization across domains and datasets, and incorporating multimodal information (e.g., audio, pose) to improve accuracy and robustness. These advancements have significant implications for various applications, including healthcare (remote physiological measurement), robotics (manipulation), and video editing (inpainting, generation), by enabling more accurate and efficient analysis and manipulation of video content.
Papers
February 24, 2022
February 21, 2022
January 25, 2022
January 16, 2022
December 28, 2021
December 14, 2021
December 2, 2021
November 25, 2021
November 24, 2021
November 23, 2021