DeepSpeed Ulysses
DeepSpeed-Ulysses is a system optimization technique designed to efficiently train extremely long-sequence transformer models, addressing a key limitation in current large language models (LLMs) and vision transformers (ViTs). Research focuses on improving sequence parallelism, often in conjunction with other parallelism strategies like pipeline and tensor parallelism, to overcome memory and communication bottlenecks inherent in processing vast amounts of sequential data. This work is significantly impacting scientific fields like climate modeling and structural biology by enabling the training of larger, more accurate models on previously intractable datasets, accelerating scientific discovery.
Papers
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Zhewei Yao, Xiaoxia Wu, Conglong Li, Minjia Zhang, Heyang Qin, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He