Diffusion Transformer
Diffusion Transformers (DiTs) are a class of generative models leveraging the transformer architecture to improve upon the capabilities of traditional diffusion models, primarily aiming for efficient and high-quality generation of various data modalities, including images, audio, and video. Current research focuses on optimizing DiT architectures for speed and efficiency through techniques like dynamic computation, token caching, and quantization, as well as exploring their application in diverse tasks such as image super-resolution, text-to-speech synthesis, and medical image segmentation. The improved efficiency and scalability of DiTs, along with their ability to handle complex data dependencies, are significantly impacting generative modeling across multiple scientific fields and practical applications.
Papers - Page 8
Presto! Distilling Steps and Layers for Accelerating Music Generation
Zachary Novack, Ge Zhu, Jonah Casebeer, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. BryanEditing Music with Melody and Text: Using ControlNet for Diffusion Transformer
Siyuan Hou, Shansong Liu, Ruibin Yuan, Wei Xue, Ying Shan, Mangsuo Zhao, Chao Zhang
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Haotian Sun, Tao Lei, Bowen Zhang, Yanghao Li, Haoshuo Huang, Ruoming Pang, Bo Dai, Nan DuHarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration
Yushi Huang, Zining Wang, Ruihao Gong, Jing Liu, Xinjie Zhang, Jinyang Guo, Xianglong Liu, Jun Zhang
MegActor-\Sigma: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer
Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji, Jiajun Liang, Haoqiang Fan, Jin WangDiffSurf: A Transformer-based Diffusion Model for Generating and Reconstructing 3D Surfaces in Pose
Yusuke Yoshiyasu, Leyuan Sun