Efficient Softmax Approximation for Deep Neural Networks with Attention Mechanism [2111.10770]