Visual Storytelling
Visual storytelling research focuses on automatically generating coherent narratives from image sequences or single images, aiming to bridge the gap between visual and textual information. Current efforts leverage large language models (LLMs) and vision-language models (VLMs), often employing diffusion models, transformers, and reinforcement learning to improve narrative coherence, visual grounding, and emotional resonance. These advancements are driving progress in diverse applications, including automated content creation, enhanced accessibility for visually impaired individuals, and novel approaches to human-computer interaction in areas like psychotherapy. The field is also actively developing robust evaluation metrics to better assess the quality and human-likeness of generated stories.