VQA Datasets
Visual Question Answering (VQA) datasets are collections of images paired with questions and answers, used to train and evaluate AI models capable of understanding and reasoning about visual information. Current research focuses on improving model performance through techniques like attention mechanisms guided by image segmentation, question decomposition for multi-hop reasoning, and leveraging external knowledge bases or large language models. These advancements are driving the development of more robust and accurate VQA systems, with applications ranging from assisting users with software-related questions to enhancing blind video quality assessment and improving multimodal information retrieval in complex documents. The creation of diverse and challenging datasets, including those focused on multilingual capabilities and safety considerations, is crucial for pushing the boundaries of this rapidly evolving field.
Papers
Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric
Zhichao Zhang, Wei Sun, Xinyue Li, Yunhao Li, Qihang Ge, Jun Jia, Zicheng Zhang, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Xiongkuo Min, Guangtao Zhai
Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
Hao Yi, Qingyang Li, Yulan Hu, Fuzheng Zhang, Di Zhang, Yong Liu