Efficient Generative
Efficient generative modeling focuses on creating high-quality outputs from generative models (like large language models and diffusion models) while minimizing computational resources and time. Current research emphasizes optimizing inference speed and memory usage through techniques such as dynamic key-value cache management, efficient compression algorithms, and novel model architectures like diffusion GANs. These advancements are crucial for deploying large generative models on resource-constrained devices and for improving the scalability and accessibility of generative AI across various applications, including text generation, image synthesis, and scientific simulations.
Papers
Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models
Paul Henderson, Melonie de Almeida, Daniela Ivanova, Titas Anciukevičius
D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models
Zhongwei Wan, Xinjian Wu, Yu Zhang, Yi Xin, Chaofan Tao, Zhihong Zhu, Xin Wang, Siqi Luo, Jing Xiong, Mi Zhang