Visual Scanpaths

Visual scanpaths, the sequences of gaze fixations during scene exploration, are a key area of research in vision science and artificial intelligence, aiming to understand and predict human visual attention. Current research focuses on developing increasingly sophisticated models, including transformer-based architectures and those leveraging contrastive language-image pre-training, to accurately predict scanpaths in various contexts, such as free viewing and task-driven scenarios (e.g., image captioning), and even in 3D environments. These advancements are improving our understanding of human cognitive processes and have implications for applications like virtual/augmented reality and natural language processing, where predicting gaze behavior can enhance user experience and model performance.

Papers