Story Visualization
Story visualization focuses on generating images or videos that coherently depict textual narratives, aiming to improve comprehension and engagement with stories across various fields. Current research heavily utilizes large language models (LLMs) in conjunction with diffusion models and transformers to generate visually consistent and contextually relevant image sequences, often incorporating techniques like spatial-temporal attention and character-centric modeling to enhance coherence. This field is significant for advancing human-computer interaction, improving data storytelling, and creating more engaging educational and entertainment experiences, with ongoing efforts to improve efficiency, reduce computational costs, and enhance the controllability of the generated visualizations.
Papers
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong
StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization
Jinlu Zhang, Jiji Tang, Rongsheng Zhang, Tangjie Lv, Xiaoshuai Sun