Zero Shot Action Recognition

Zero-shot action recognition aims to enable computer systems to identify actions in videos without prior training on those specific actions, focusing on generalizing learned knowledge to novel categories. Current research heavily utilizes vision-language models (VLMs), often incorporating techniques like dual visual-text alignment, multimodal prompting, and information compensation to bridge the semantic gap between visual features and textual descriptions of actions. This field is significant because it addresses the scalability and generalization limitations of traditional action recognition methods, potentially impacting applications such as robotics, video surveillance, and assistive technologies.

Papers