Multi Modal Benchmark
Multi-modal benchmarks are crucial for evaluating the performance of models that process and integrate information from multiple data types (e.g., text, images, audio). Current research focuses on developing comprehensive benchmarks that address limitations in existing datasets, such as insufficient diversity, lack of long-context understanding, and potential data leakage, often employing large language models (LLMs) for data generation and annotation. These benchmarks are vital for advancing the development of robust multi-modal models and improving applications across diverse fields, including video understanding, document analysis, and e-commerce.
21papers
Papers
March 17, 2025
Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning
Mengyao Lyu, Yan Li, Huasong Zhong, Wenhao Yang, Hui Chen, Jungong Han, Guiguang Ding, Zhenheng YangTsinghua University●BNRist●BytedanceNuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models
Sung-Yeon Park, Can Cui, Yunsheng Ma, Ahmadreza Moradipari, Rohit Gupta, Kyungtae Han, Ziran WangPurdue University●Toyota InfoTech Labs
October 14, 2024
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content
Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh, Wei Lin, M. Jehanzeb Mirza, Leshem Chosen, Mikhail Yurochkin, Yuekai Sun, Assaf Arbelle+2Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation
Shun Qian, Bingquan Liu, Chengjie Sun, Zhen Xu, Baoxun Wang
July 19, 2024