Paper ID: 2408.14419

CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

Shubham Bharti, Shiyun Cheng, Jihyun Rho, Martina Rao, Xiaojin Zhu

We introduce CHARTOM, a visual theory-of-mind benchmark for multimodal large language models. CHARTOM consists of specially designed data visualizing charts. Given a chart, a language model needs to not only correctly comprehend the chart (the FACT question) but also judge if the chart will be misleading to a human reader (the MIND question). Both questions have significant societal benefits. We detail the construction of the CHARTOM benchmark including its calibration on human performance.

Submitted: Aug 26, 2024

Topics

Language Model
New Benchmark
Multimodal Large Language Model
Theory of Mind

Links

arXiv PDF