Streaming Transformer

Streaming Transformers are deep learning models designed to process sequential data in a real-time, or near real-time, manner, addressing the limitations of traditional Transformers which struggle with long sequences. Current research focuses on adapting Transformer architectures, such as decoder-only models and those incorporating cumulative or blockwise attention mechanisms, for various applications including speech recognition, machine translation, and video understanding. This focus on efficient, low-latency processing significantly impacts fields like real-time audio processing and interactive systems, enabling faster and more responsive AI applications.

Papers