Video Saliency

Video saliency research focuses on automatically identifying the most visually attention-grabbing regions within video frames, mirroring human visual attention. Current research emphasizes improving model accuracy across diverse video types (including 360° and RGB-D videos), often employing deep learning architectures like convolutional neural networks and transformers, sometimes incorporating motion and depth information, or even audio-visual fusion. These advancements have implications for various applications, such as video summarization, content accessibility, and personalized VR/AR experiences, by enabling more efficient and effective processing and analysis of visual data.

Papers