Multimodal Topic Modeling

Multimodal topic modeling aims to discover underlying themes in datasets containing multiple data types, such as text and images, by jointly analyzing their combined information. Current research focuses on developing neural network-based models, often incorporating large language models or graph-based representations to effectively integrate and interpret diverse data modalities, and improving evaluation metrics for assessing the coherence and diversity of discovered topics. This field is significant for applications like social media analysis (e.g., understanding meme trends) and relation extraction, where integrating visual and textual information enhances accuracy and provides richer insights.

Papers