Visual Story Generation
Visual story generation aims to automatically create narratives accompanied by corresponding images or videos, focusing on generating coherent and engaging stories with consistent characters and scenes. Current research emphasizes improving character consistency and coreference across generated sequences, often employing diffusion models and leveraging techniques like adaptive context modeling and novel self-attention mechanisms to enhance image and video coherence. This field is significant for advancing AI capabilities in creative content generation and has implications for applications ranging from interactive entertainment and assistive technologies to automated media production. Recent work also explores training methods that reduce reliance on paired image-text data, improving scalability and generalization.