Large Scale Video
Large-scale video research focuses on efficiently processing and understanding vast amounts of video data, addressing challenges in annotation, retrieval, and generation. Current efforts concentrate on developing powerful video-language models, leveraging techniques like hierarchical embeddings and transformer architectures, to improve video understanding tasks such as object tracking, activity recognition, and question answering. These advancements are crucial for applications ranging from automated video analysis in surveillance and healthcare to enhancing content creation and retrieval tools, ultimately impacting various fields through improved efficiency and accuracy.
Papers
Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge Devices
Yimeng Zhang, Akshay Karkal Kamath, Qiucheng Wu, Zhiwen Fan, Wuyang Chen, Zhangyang Wang, Shiyu Chang, Sijia Liu, Cong Hao
TLDW: Extreme Multimodal Summarisation of News Videos
Peggy Tang, Kun Hu, Lei Zhang, Jiebo Luo, Zhiyong Wang