Length Extrapolation
Length extrapolation in transformer-based models focuses on enabling these models to accurately process sequences significantly longer than those seen during training. Current research emphasizes improving positional encoding methods, exploring various kernel functions and adaptive or data-driven approaches within transformer architectures like Mamba and LLMs, to enhance the models' ability to generalize to longer inputs. This is crucial for handling real-world applications involving lengthy text data, such as legal documents or scientific papers, where existing length limitations hinder performance. Success in this area would significantly broaden the applicability of large language models.
Papers
December 25, 2024
November 5, 2024
October 11, 2024
October 10, 2024
October 7, 2024
July 21, 2024
June 20, 2024
May 27, 2024
May 23, 2024
April 18, 2024
March 26, 2024
March 4, 2024
January 29, 2024
December 28, 2023
October 25, 2023
July 19, 2023
May 5, 2023
December 20, 2022