Multiple Choice VideoQA

Multiple Choice VideoQA focuses on developing systems that accurately answer questions about video content by selecting the correct answer from a set of options. Current research emphasizes improving model robustness and interpretability, particularly addressing challenges like temporal reasoning, handling diverse question types, and mitigating biases in training data. This involves exploring various architectures, including transformer-based models, graph neural networks, and the integration of large language models, often incorporating techniques like contrastive learning and attention mechanisms to better align visual and textual information. Advances in this field have significant implications for applications such as video indexing, retrieval, and content understanding, as well as advancing our understanding of multimodal reasoning.

Papers

April 1, 2024

VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao
Knowledge Distillation Video Question Answering Answer Generation Multiple Choice VideoQA Video Stabilization Question Representation

February 16, 2024

Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
David Romero, Thamar Solorio
Multiple Choice VideoQA Video Question VideoQA Model Zero Shot Visual Question Answering VideoQA Benchmark Question Driven Image Caption

February 5, 2024

VlogQA: Task, Dataset, and Baseline Models for Vietnamese Spoken-Based Machine Reading Comprehension
Thinh Phuoc Ngo, Khoa Tran Anh Dang, Son T. Luu, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen
Data Set Related Task Machine Reading Comprehension Multiple Choice VideoQA Baseline Model Vietnamese E Commerce Website Vietnamese Machine Reading Comprehension

January 30, 2024

YTCommentQA: Video Question Answerability in Instructional Videos
Saelyne Yang, Sunghyun Park, Yunseok Jang, Moontae Lee
Video Question Answering Multiple Choice VideoQA Instructional Video Video Question Video Content Video Reasoning

January 19, 2024

Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering
Haibo Wang, Chenghang Lai, Yixuan Sun, Weifeng Ge
Large Multimodal Model Visual Question Video Question Answering Multiple Choice VideoQA Frame Level Pseudo Label

January 3, 2024

December 20, 2023

Cross-Modal Reasoning with Event Correlation for Video Question Answering
Chengxiang Yin, Zhengping Che, Kun Wu, Zhiyuan Xu, Qinru Qiu, Jian Tang
Video Question Answering Multiple Choice VideoQA Fusion Module VideoQA Model Cross Modal Reasoning Event Correlation

October 24, 2023

Large Language Models are Temporal and Causal Reasoners for Video Question Answering
Dohwan Ko, Ji Soo Lee, Wooyoung Kang, Byungseok Roh, Hyunwoo J. Kim
Video Question Answering Causal Reasoning Multiple Choice VideoQA 3d Vqa VideoQA Benchmark

October 10, 2023

Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering
Xiulong Liu, Zhikang Dong, Peng Zhang
Multiple Choice VideoQA Balanced Dataset Multimodal Analysis Unbiased SGG

September 4, 2023

Can I Trust Your Answer? Visually Grounded Video Question Answering
Junbin Xiao, Angela Yao, Yicong Li, Tat Seng Chua
Vision Language Model Video Question Answering Top Two Answer Multiple Choice VideoQA Temporal Grounding VQA System Video Language Understanding

August 18, 2023

Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models
Dohwan Ko, Ji Soo Lee, Miso Choi, Jaewon Chu, Jihwan Park, Hyunwoo J. Kim
New Benchmark Stronger Generalizability Video Question Answering Multiple Choice VideoQA VideoQA Model

August 7, 2023

Redundancy-aware Transformer for Video Question Answering
Yicong Li, Xun Yang, An Zhang, Chun Feng, Xiang Wang, Tat-Seng Chua
Video Question Answering Multiple Choice VideoQA VideoQA Model VideoQA Benchmark

July 25, 2023

Keyword-Aware Relative Spatio-Temporal Graph Networks for Video Question Answering
Yi Cheng, Hehe Fan, Dongyun Lin, Ying Sun, Mohan Kankanhalli, Joo-Hwee Lim
Spatio Temporal Video Question Answering Multiple Choice VideoQA Question Representation

July 22, 2023

Discovering Spatio-Temporal Rationales for Video Question Answering
Yicong Li, Junbin Xiao, Chun Feng, Xiang Wang, Tat-Seng Chua
Spatio Temporal Video Question Answering Multiple Choice VideoQA Video Question

July 8, 2023

Reading Between the Lanes: Text VideoQA on the Road
George Tom, Minesh Mathew, Sergi Garcia, Dimosthenis Karatzas, C. V. Jawahar
Scene Text Recognition Multiple Choice VideoQA Built Road Geometrically Sound LANE Text VQA

June 15, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Junting Pan, Ziyi Lin, Yuying Ge, Xiatian Zhu, Renrui Zhang, Yi Wang, Yu Qiao, Hongsheng Li
Video Question Answering Captioning Model Multiple Choice VideoQA Video Question VideoQA Model VideoQA Benchmark Answer Retrieval

May 17, 2023

TG-VQA: Ternary Game of Video Question Answering
Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, Jie Chen
Video Question Answering Multiple Choice VideoQA VideoQA Model

May 16, 2023

Is a Video worth $n\times n$ Images? A Highly Efficient Approach to Transformer-based Video Question Answering
Chenyang Lyu, Tianbo Ji, Yvette Graham, Jennifer Foster
Vision Language Video Question Answering Multiple Choice VideoQA Efficient Approach VideoQA Model Open Ended Visual Question Answering \Bf{T}$ Image

Multiple Choice VideoQA

Papers

VideoDistill: Language-aware Vision Distillation for Video Question Answering

Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering

VlogQA: Task, Dataset, and Baseline Models for Vietnamese Spoken-Based Machine Reading Comprehension

YTCommentQA: Video Question Answerability in Instructional Videos

Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering

Glance and Focus: Memory Prompting for Multi-Event Video Question Answering

Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering

Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports

Cross-Modal Reasoning with Event Correlation for Video Question Answering

Large Language Models are Temporal and Causal Reasoners for Video Question Answering

Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering

Can I Trust Your Answer? Visually Grounded Video Question Answering

Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models

Redundancy-aware Transformer for Video Question Answering

Keyword-Aware Relative Spatio-Temporal Graph Networks for Video Question Answering

Discovering Spatio-Temporal Rationales for Video Question Answering

Reading Between the Lanes: Text VideoQA on the Road

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models

TG-VQA: Ternary Game of Video Question Answering

Is a Video worth $n\times n$ Images? A Highly Efficient Approach to Transformer-based Video Question Answering