Multi Modal Large Language Model
Multi-modal large language models (MLLMs) integrate visual and textual information to perform complex tasks, aiming to bridge the gap between human-like understanding and machine intelligence. Current research emphasizes improving the consistency and fairness of MLLMs, exploring efficient fusion mechanisms (like early fusion and Mixture-of-Experts architectures), and developing benchmarks to evaluate their performance across diverse tasks, including medical image analysis and autonomous driving. This rapidly evolving field holds significant potential for advancing various applications, from healthcare diagnostics to robotics, by enabling more robust and reliable AI systems capable of handling real-world complexities.
Papers
Benchmarking Mobile Device Control Agents across Diverse Configurations
Juyong Lee, Taywon Min, Minyong An, Dongyoon Hahm, Haeone Lee, Changyeon Kim, Kimin Lee
Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples
Kuofeng Gao, Jindong Gu, Yang Bai, Shu-Tao Xia, Philip Torr, Wei Liu, Zhifeng Li
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models
Gabriela Ben Melech Stan, Estelle Aflalo, Raanan Yehezkel Rohekar, Anahita Bhiwandiwalla, Shao-Yen Tseng, Matthew Lyle Olson, Yaniv Gurwicz, Chenfei Wu, Nan Duan, Vasudev Lal
VIAssist: Adapting Multi-modal Large Language Models for Users with Visual Impairments
Bufang Yang, Lixing He, Kaiwei Liu, Zhenyu Yan
Evaluating GPT-4 with Vision on Detection of Radiological Findings on Chest Radiographs
Yiliang Zhou, Hanley Ong, Patrick Kennedy, Carol Wu, Jacob Kazam, Keith Hentel, Adam Flanders, George Shih, Yifan Peng
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
Qiong Wu, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Han Zhao, Min Zhang, Wei Zhao, Pengxiang Ding, Siteng Huang, Donglin Wang
Dermacen Analytica: A Novel Methodology Integrating Multi-Modal Large Language Models with Machine Learning in tele-dermatology
Dimitrios P. Panagoulias, Evridiki Tsoureli-Nikita, Maria Virvou, George A. Tsihrintzis