Multiple Choice VideoQA
Multiple Choice VideoQA focuses on developing systems that accurately answer questions about video content by selecting the correct answer from a set of options. Current research emphasizes improving model robustness and interpretability, particularly addressing challenges like temporal reasoning, handling diverse question types, and mitigating biases in training data. This involves exploring various architectures, including transformer-based models, graph neural networks, and the integration of large language models, often incorporating techniques like contrastive learning and attention mechanisms to better align visual and textual information. Advances in this field have significant implications for applications such as video indexing, retrieval, and content understanding, as well as advancing our understanding of multimodal reasoning.
Papers
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
Ziyi Bai, Ruiping Wang, Xilin Chen
Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering
Haopeng Li, Qiuhong Ke, Mingming Gong, Tom Drummond
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
Haopeng Li, Andong Deng, Qiuhong Ke, Jun Liu, Hossein Rahmani, Yulan Guo, Bernt Schiele, Chen Chen