Dot Product Attention

Dot product attention is a core mechanism in transformer-based models, calculating relationships between data elements (e.g., words in a sentence, pixels in an image) by measuring the similarity of their vector representations. Current research focuses on improving its efficiency and robustness, exploring alternatives like elliptical attention (using Mahalanobis distance), symmetric dot-product attention, and variations employing Hadamard products or replacing softmax with ReLU and addition. These advancements aim to enhance model performance, reduce computational costs, and address limitations such as representation collapse and vulnerability to adversarial attacks, impacting diverse fields from natural language processing and computer vision to physical simulations.

Papers