Multimodal Large Language Model
Multimodal large language models (MLLMs) integrate multiple data modalities, such as text, images, and audio, to enhance understanding and reasoning capabilities beyond those of unimodal models. Current research emphasizes improving MLLM performance through refined architectures (e.g., incorporating visual grounding, chain-of-thought prompting), mitigating biases and hallucinations, and developing robust evaluation benchmarks that assess various aspects of multimodal understanding, including active perception and complex reasoning tasks. This work is significant because it pushes the boundaries of AI capabilities, leading to advancements in diverse applications like medical diagnosis, financial analysis, and robotic manipulation.
Papers
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games
Nicholas R. Waytowich, Devin White, MD Sunbeam, Vinicius G. Goecks
A Survey on Evaluation of Multimodal Large Language Models
Jiaxing Huang, Jingyi Zhang
Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models
Wenbin Wang, Liang Ding, Minyan Zeng, Xiabin Zhou, Li Shen, Yong Luo, Dacheng Tao
LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal!
Jainaveen Sundaram, Ravishankar Iyer
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Yi-Fan Zhang, Huanyu Zhang, Haochen Tian, Chaoyou Fu, Shuangqing Zhang, Junfei Wu, Feng Li, Kun Wang, Qingsong Wen, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan
ParGo: Bridging Vision-Language with Partial and Global Views
An-Lan Wang, Bin Shan, Wei Shi, Kun-Yu Lin, Xiang Fei, Guozhi Tang, Lei Liao, Jingqun Tang, Can Huang, Wei-Shi Zheng
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
Bin Wang, Chunyu Xie, Dawei Leng, Yuhui Yin
Semantic Alignment for Multimodal Large Language Models
Tao Wu, Mengze Li, Jingyuan Chen, Wei Ji, Wang Lin, Jinyang Gao, Kun Kuang, Zhou Zhao, Fei Wu
Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models
Jean Park, Kuk Jin Jang, Basam Alasaly, Sriharsha Mopidevi, Andrew Zolensky, Eric Eaton, Insup Lee, Kevin Johnson
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese
Khang T. Doan, Bao G. Huynh, Dung T. Hoang, Thuc D. Pham, Nhat H. Pham, Quan T. M. Nguyen, Bang Q. Vo, Suong N. Hoang
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
Chaoya Jiang, Jia Hongrui, Haiyang Xu, Wei Ye, Mengfan Dong, Ming Yan, Ji Zhang, Fei Huang, Shikun Zhang
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs
Yuanyang Yin, Yaqi Zhao, Yajie Zhang, Ke Lin, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Baoqun Yin, Wentao Zhang
Open-Ended 3D Point Cloud Instance Segmentation
Phuc D. A. Nguyen, Minh Luu, Anh Tran, Cuong Pham, Khoi Nguyen
Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model
Mengying Ge, Dongkai Tang, Mingyang Li
FLAME: Learning to Navigate with Multimodal LLM in Urban Environments
Yunzhe Xu, Yiyuan Pan, Zhe Liu, Hesheng Wang
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Qianqian Xie, Dong Li, Mengxi Xiao, Zihao Jiang, Ruoyu Xiang, Xiao Zhang, Zhengyu Chen, Yueru He, Weiguang Han, Yuzhe Yang, Shunian Chen, Yifei Zhang, Lihang Shen, Daniel Kim, Zhiwei Liu, Zheheng Luo, Yangyang Yu, Yupeng Cao, Zhiyang Deng, Zhiyuan Yao, Haohang Li, Duanyu Feng, Yongfu Dai, VijayaSai Somasundaram, Peng Lu, Yilun Zhao, Yitao Long, Guojun Xiong, Kaleb Smith, Honghai Yu, Yanzhao Lai, Min Peng, Jianyun Nie, Jordan W. Suchow, Xiao-Yang Liu, Benyou Wang, Alejandro Lopez-Lira, Jimin Huang, Sophia Ananiadou
FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant
Zhengchao Huang, Bin Xia, Zicheng Lin, Zhun Mou, Wenming Yang, Jiaya Jia
Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation
Yuyang Ye, Zhi Zheng, Yishan Shen, Tianshu Wang, Hengruo Zhang, Peijun Zhu, Runlong Yu, Kai Zhang, Hui Xiong