Diffusion Transformer
Diffusion Transformers (DiTs) are a class of generative models leveraging the transformer architecture to improve upon the capabilities of traditional diffusion models, primarily aiming for efficient and high-quality generation of various data modalities, including images, audio, and video. Current research focuses on optimizing DiT architectures for speed and efficiency through techniques like dynamic computation, token caching, and quantization, as well as exploring their application in diverse tasks such as image super-resolution, text-to-speech synthesis, and medical image segmentation. The improved efficiency and scalability of DiTs, along with their ability to handle complex data dependencies, are significantly impacting generative modeling across multiple scientific fields and practical applications.
Papers
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Jingfeng Yao, Xinggang Wang
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
Jianyi Wang, Zhijie Lin, Meng Wei, Yang Zhao, Ceyuan Yang, Chen Change Loy, Lu Jiang
EliGen: Entity-Level Controlled Image Generation with Regional Attention
Hong Zhang, Zhongjie Duan, Xingjun Wang, Yingda Chen, Yu Zhang
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers
Lianghua Huang, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Chen Liang, Tong Shen, Han Zhang, Huanzhang Dou, Yu Liu, Jingren Zhou
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Yanyu Li, Yifan Gong, Kai Zhang, Hao Tan, Jason Kuen, Henghui Ding, Zhihao Shu, Wei Niu, Pu Zhao, Yanzhi Wang, Jiuxiang Gu
Video Motion Transfer with Diffusion Transformers
Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov, Philip Torr, Fabio Pizzati
From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Tianwei Yin, Qiang Zhang, Richard Zhang, William T. Freeman, Fredo Durand, Eli Shechtman, Xun Huang