Paper ID: 2303.17475
Efficient distributed representations with linear-time attention scores normalization
Lorenzo Dall'Amico, Enrico Maria Belliardo
The attention score matrix ${\rm SoftMax}(XY^T)$ encodes relational similarity patterns between objects and is extremely popular in machine learning. However, the complexity required to calculate it runs quadratically with the problem size, making it a computationally heavy solution. In this article, we propose a linear-time approximation of the attention score normalization constants for embedding vectors with bounded norms. We show on several pre-trained embeddings that the accuracy of our estimation formula surpasses competing kernel methods by even orders of magnitude. From this result, we design a linear-time and task-agnostic embedding algorithm based on the optimization of the attention scores. The proposed algorithm is highly interpretable and easily adapted to an arbitrary embedding problem. We consider a few use-cases and observe similar or higher performances and a lower computational time with respect to comparable embedding algorithms.
Submitted: Mar 30, 2023