Long Sequence Transformer
Long-sequence transformers aim to overcome the limitations of standard transformers, which struggle with processing very long input sequences due to the quadratic complexity of self-attention. Current research focuses on developing efficient algorithms and architectures, such as sparse attention mechanisms (e.g., in Longformer and BigBird) and distributed training methods, to enable the training and inference of these models on extremely long sequences. This work is significant because it allows for improved performance on tasks involving long texts, such as document classification and clinical NLP, where capturing long-range dependencies is crucial, and leads to faster training and inference times.
Papers
May 5, 2024
November 4, 2023
September 25, 2023
March 14, 2023
January 27, 2023
November 6, 2022