Dot Product Self Attention
Dot product self-attention is a core mechanism in Transformer networks, enabling them to process sequential data by weighting the importance of different input elements. Current research focuses on addressing limitations of standard dot-product attention, such as quadratic computational complexity, susceptibility to representation collapse, and overconfidence in predictions, through methods like elliptical attention, optimal transport-based alternatives (e.g., SeTformer), and Lipschitz regularization. These advancements aim to improve the efficiency, robustness, and calibration of Transformer models across diverse applications, including image recognition, natural language processing, and sequential recommendation.
Papers
June 19, 2024
February 1, 2024
January 7, 2024
June 12, 2023
September 11, 2022
May 17, 2022
April 22, 2022
January 28, 2022