Video Perception

Video perception research aims to enable computers to understand and interpret video content as effectively as humans do, focusing on tasks like object segmentation, action recognition, and question answering. Current efforts concentrate on integrating large language models (LLMs) with visual processing to improve contextual understanding and reasoning capabilities, often employing transformer-based architectures and novel quantization techniques for efficiency. These advancements are significant for applications ranging from automated sports analysis and autonomous driving to enhancing video quality assessment and improving the robustness of computer vision systems against adversarial attacks.

Papers