Multimodal Phenomenon
Multimodal research focuses on developing artificial intelligence systems that can effectively process and integrate information from multiple data sources (e.g., text, images, audio, video). Current efforts concentrate on improving the robustness and accuracy of multimodal large language models (MLLMs) through techniques like chain-of-thought prompting, contrastive learning, and multimodal masked autoencoders, often addressing challenges such as hallucination mitigation and efficient resource utilization on edge devices. This field is significant because it enables more comprehensive and nuanced understanding of complex phenomena, with applications ranging from improved medical diagnosis and drug discovery to enhanced human-computer interaction and more effective educational tools. The development of robust benchmarks and open-source tools is also a key area of focus to facilitate collaborative research and development.
Papers
TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models
Ziyao Shangguan, Chuhan Li, Yuxuan Ding, Yanan Zheng, Yilun Zhao, Tesca Fitzgerald, Arman Cohan
PIP-MM: Pre-Integrating Prompt Information into Visual Encoding via Existing MLLM Structures
Tianxiang Wu, Minxin Nie, Ziqiang Cao
Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation
Zhaochong An, Guolei Sun, Yun Liu, Runjia Li, Min Wu, Ming-Ming Cheng, Ender Konukoglu, Serge Belongie
ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising
Ashutosh Chaubey, Anoubhav Agarwaal, Sartaki Sinha Roy, Aayush Agarwal, Susmita Ghose
Evaluating Cost-Accuracy Trade-offs in Multimodal Search Relevance Judgements
Silvia Terragni, Hoang Cuong, Joachim Daiber, Pallavi Gudipati, Pablo N. Mendes
Improving Multimodal Large Language Models Using Continual Learning
Shikhar Srivastava, Md Yousuf Harun, Robik Shrestha, Christopher Kanan
Conformal Prediction for Multimodal Regression
Alexis Bose, Jonathan Ethier, Paul Guinand
Personalized Recommendation Systems using Multimodal, Autonomous, Multi Agent Systems
Param Thakkar, Anushka Yadav
IPL: Leveraging Multimodal Large Language Models for Intelligent Product Listing
Kang Chen, Qingheng Zhang, Chengbao Lian, Yixin Ji, Xuwei Liu, Shuguang Han, Guoqiang Wu, Fei Huang, Jufeng Chen
$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Yaxin Luo, Gen Luo, Jiayi Ji, Yiyi Zhou, Xiaoshuai Sun, Zhiqiang Shen, Rongrong Ji
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Chengyue Wu, Xiaokang Chen, Zhiyu Wu, Yiyang Ma, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, Ping Luo