Softmax Function
The softmax function, a crucial component in many machine learning models, transforms a vector of arbitrary real numbers into a probability distribution, enabling classification and other probabilistic tasks. Current research focuses on addressing its limitations, such as sensitivity to outliers, the "softmax bottleneck" in large language models, and its computational cost in high-dimensional spaces, exploring alternatives like sigmoid functions and adaptive temperature scaling, and modifications to improve efficiency and calibration. These efforts aim to enhance the performance, robustness, and scalability of various machine learning architectures, particularly in applications involving large datasets and long sequences, such as natural language processing and medical image analysis.
Papers
Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization
Firas Khader, Omar S. M. El Nahhas, Tianyu Han, Gustav Müller-Franzes, Sven Nebelung, Jakob Nikolas Kather, Daniel Truhn
MultiMax: Sparse and Multi-Modal Attention Learning
Yuxuan Zhou, Mario Fritz, Margret Keuper