Long Sequence Transformer

Long-sequence transformers aim to overcome the limitations of standard transformers, which struggle with processing very long input sequences due to the quadratic complexity of self-attention. Current research focuses on developing efficient algorithms and architectures, such as sparse attention mechanisms (e.g., in Longformer and BigBird) and distributed training methods, to enable the training and inference of these models on extremely long sequences. This work is significant because it allows for improved performance on tasks involving long texts, such as document classification and clinical NLP, where capturing long-range dependencies is crucial, and leads to faster training and inference times.

Papers