Text to Video
Text-to-video (T2V) generation aims to create realistic videos from textual descriptions, focusing on improving temporal consistency, handling multiple objects and actions, and enhancing controllability. Current research heavily utilizes diffusion models, often building upon pre-trained text-to-image models and incorporating advanced architectures like Diffusion Transformers (DiT) and spatial-temporal attention mechanisms to improve video quality and coherence. This rapidly evolving field holds significant implications for content creation, education, and various other applications, driving advancements in both model architectures and evaluation methodologies to address challenges like hallucination and compositional generation.
Papers
October 23, 2023
October 16, 2023
September 28, 2023
September 25, 2023
September 14, 2023
August 24, 2023
August 22, 2023
August 18, 2023
August 12, 2023
July 4, 2023
June 2, 2023
May 23, 2023
May 18, 2023
May 10, 2023
May 6, 2023
May 4, 2023
April 18, 2023
March 16, 2023