Visual Linguistic Causal

Visual-linguistic causal reasoning focuses on understanding causal relationships between visual and textual information, aiming to build AI systems that can reason about cause and effect from multimodal data. Current research emphasizes developing methods for disentangling spurious correlations and identifying true causal links using techniques like causal intervention and front-door adjustments, often implemented within transformer-based architectures. This field is crucial for improving the robustness and reliability of applications such as video question answering, medical report generation, and other tasks requiring nuanced understanding of complex visual-linguistic scenarios. The development of open-source toolboxes and benchmarks is facilitating collaborative progress and wider adoption of these methods.

Papers