Open Vocabulary Action Recognition
Open-vocabulary action recognition (OVAR) aims to enable computers to recognize actions from video, even those not seen during training, by leveraging the power of vision-language models like CLIP. Current research focuses on improving the robustness of these models to noisy or ambiguous action descriptions, addressing challenges in cross-domain generalization, and exploring methods like residual feature distillation and multi-modal prompting to enhance performance. These advancements are significant because they pave the way for more versatile and adaptable video understanding systems with applications in areas such as video retrieval, automated surveillance, and human-computer interaction.
Papers
April 23, 2024
March 3, 2024
February 5, 2024
December 4, 2023
August 22, 2023
March 21, 2023