Video Benchmark
Video benchmarks are standardized datasets and evaluation protocols used to assess the performance of video understanding models, aiming to drive progress in areas like action recognition, video question answering, and object tracking. Current research focuses on developing more comprehensive benchmarks that address limitations of existing datasets, such as handling long videos, continuous perception, and diverse modalities (e.g., visible and thermal). This includes the development of novel model architectures, such as those incorporating contrastive learning, transformer-based approaches, and memory networks, to improve accuracy and efficiency. The resulting advancements in video understanding have significant implications for various applications, including autonomous driving, video surveillance, and assistive technologies for the visually impaired.
Papers
LLaVAction: evaluating and training multi-modal large language models for action recognition
Shaokai Ye, Haozhe Qi, Alexander Mathis, Mackenzie W. MathisEPFLUnbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Nina Shvetsova, Arsha Nagrani, Bernt Schiele, Hilde Kuehne, Christian RupprechtGoethe University Frankfurt●Tuebingen AI Center/University of Tuebingen●MPI for Informatics●University of Oxford●MIT-IBM Watson AI Lab