Diffusion Transformer
Diffusion Transformers (DiTs) are a class of generative models leveraging the transformer architecture to improve upon the capabilities of traditional diffusion models, primarily aiming for efficient and high-quality generation of various data modalities, including images, audio, and video. Current research focuses on optimizing DiT architectures for speed and efficiency through techniques like dynamic computation, token caching, and quantization, as well as exploring their application in diverse tasks such as image super-resolution, text-to-speech synthesis, and medical image segmentation. The improved efficiency and scalability of DiTs, along with their ability to handle complex data dependencies, are significantly impacting generative modeling across multiple scientific fields and practical applications.
Papers
FORA: Fast-Forward Caching in Diffusion Transformer Acceleration
Pratheba Selvaraju, Tianyu Ding, Tianyi Chen, Ilya Zharkov, Luming Liang
On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs)
Jerry Yao-Chieh Hu, Weimin Wu, Zhao Song, Han Liu
Diffusion Transformer Model With Compact Prior for Low-dose PET Reconstruction
Bin Huang, Xubiao Liu, Lei Fang, Qiegen Liu, Bingxuan Li
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
Xinyin Ma, Gongfan Fang, Michael Bi Mi, Xinchao Wang
$\Delta$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers
Pengtao Chen, Mingzhu Shen, Peng Ye, Jianjian Cao, Chongjun Tu, Christos-Savvas Bouganis, Yiren Zhao, Tao Chen
Virtual avatar generation models as world navigators
Sai Mandava