Failures in Perspective-taking of Multimodal AI Systems [2409.13929]