Source Video
Source video analysis encompasses a broad range of research aiming to extract meaningful information and perform various tasks directly from video data. Current efforts focus on developing robust and efficient methods for tasks such as 3D motion estimation, object detection and tracking, and multimodal analysis integrating audio and other sensor data, often employing deep learning architectures like transformers and diffusion models. These advancements have significant implications for diverse fields, including autonomous driving, medical diagnosis, and multimedia content creation, by enabling more sophisticated and automated processing of visual information.
Papers
Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline
Qi Jia, Baoyu Fan, Cong Xu, Lu Liu, Liang Jin, Guoguang Du, Zhenhua Guo, Yaqian Zhao, Xuanjing Huang, Rengang Li
A Comprehensive Survey of Hallucination in Large Language, Image, Video and Audio Foundation Models
Pranab Sahoo, Prabhash Meharia, Akash Ghosh, Sriparna Saha, Vinija Jain, Aman Chadha
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
Chuyi Shang, Amos You, Sanjay Subramanian, Trevor Darrell, Roei Herzig
Slightly Shift New Classes to Remember Old Classes for Video Class-Incremental Learning
Jian Jiao, Yu Dai, Hefei Mei, Heqian Qiu, Chuanyang Gong, Shiyuan Tang, Xinpeng Hao, Hongliang Li
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
Wonkyun Kim, Changin Choi, Wonseok Lee, Wonjong Rhee
TAFormer: A Unified Target-Aware Transformer for Video and Motion Joint Prediction in Aerial Scenes
Liangyu Xu, Wanxuan Lu, Hongfeng Yu, Yongqiang Mao, Hanbo Bi, Chenglong Liu, Xian Sun, Kun Fu