Paper ID: 2409.14109
Local Patterns Generalize Better for Novel Anomalies
Yalong Jiang, Liquan Mao
Video anomaly detection (VAD) aims at identifying novel actions or events which are unseen during training. Existing mainstream VAD techniques focus on the global patterns of events and cannot properly generalize to novel samples. In this paper, we propose a framework to identify the spatial local patterns which generalize to novel samples and model the dynamics of local patterns. In spatial part of the framework, the capability of extracting local patterns is gained from image-text contrastive learning with Image-Text Alignment Module (ITAM). To detect different types of anomalies, a two-branch framework is proposed for representing the local patterns in both actions and appearances. In temporal part of the framework, a State Machine Module (SMM) is proposed to model the dynamics of local patterns by decomposing their temporal variations into motion components. Different dynamics are represented with different weighted sums of a fixed set of motion components. The video sequences with either novel spatial distributions of local patterns or distinctive dynamics of local patterns are deemed as anomalies. Extensive experiments on popular benchmark datasets demonstrate that state-of-the-art performance can be achieved.
Submitted: Sep 21, 2024