Softmax Function

The softmax function, a crucial component in many machine learning models, transforms a vector of arbitrary real numbers into a probability distribution, enabling classification and other probabilistic tasks. Current research focuses on addressing its limitations, such as sensitivity to outliers, the "softmax bottleneck" in large language models, and its computational cost in high-dimensional spaces, exploring alternatives like sigmoid functions and adaptive temperature scaling, and modifications to improve efficiency and calibration. These efforts aim to enhance the performance, robustness, and scalability of various machine learning architectures, particularly in applications involving large datasets and long sequences, such as natural language processing and medical image analysis.

Papers

August 19, 2024

Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment
Zhanzhong Pang, Fadime Sener, Shrinivas Ramasubramanian, Angela Yao
Softmax Function Long Tailed Distribution Temporal Action Segmentation Temporal Segmentation Logit Adjustment Logit Vector

July 18, 2024

Attention in SRAM on Tenstorrent Grayskull
Moritz Thüning
Human Attention Softmax Function Self Attention Layer SRAM Cell

July 11, 2024

SALT: Introducing a Framework for Hierarchical Segmentations in Medical Imaging using Softmax for Arbitrary Label Trees
Sven Koitka, Giulia Baldini, Cynthia S. Schmidt, Olivia B. Pollok, Obioma Pelka, Judith Kohnke, Katarzyna Borys, Christoph M. Friedrich, Benedikt M. Schaarschmidt, Michael Forsting, Lale Umutlu, Johannes Haubold, Felix Nensa, René Hosch
New Framework Medical Imaging Softmax Function Hierarchical Segmentation Body Segmentation Label Tree

June 14, 2024

What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?
Weijie Tu, Weijian Deng, Liang Zheng, Tom Gedeon
Softmax Function United State Attribute Correlation Softmax Probability Correlation Matrix Softmax Output

June 4, 2024

On The Statistical Representation Properties Of The Perturb-Softmax And The Perturb-Argmax Probability Distributions
Hedda Cohen Indelman, Tamir Hazan
Softmax Function Generative Learning Discriminative Learning Gumbel Softmax

June 3, 2024

May 22, 2024

Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
Huy Nguyen, Nhat Ho, Alessandro Rinaldo
Mixture Component Expert Knowledge Softmax Function Gating Mechanism Estimation Performance Softmax Gating

May 20, 2024

Reward-Punishment Reinforcement Learning with Maximum Entropy
Jiexin Wang, Eiji Uchibe
Reinforcement Learning Softmax Function Maximum Entropy Behavior Policy Reinforcement Learning Objective

May 9, 2024

Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Yeqi Gao, Yuzhou Gu, Zhao Song
Softmax Function Hypothesis Testing Leverage Score

May 6, 2024

Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond
Jiuxiang Gu, Chenyang Li, Yingyu Liang, Zhenmei Shi, Zhao Song
Diffusion Model Financial Application Neural Tangent Kernel Softmax Function New Frontier Softmax Classifier

May 1, 2024

Geometric Insights into Focal Loss: Reducing Curvature for Enhanced Model Calibration
Masanari Kimura, Hiroki Naganuma
Softmax Function Cross Entropy Focal Loss Gaussian Curvature Model Calibration Model Confidence

April 27, 2024

A Method of Moments Embedding Constraint and its Application to Semi-Supervised Learning
Michael Majurski, Sumeet Menon, Parniyan Farvardin, David Chapman
Application Proficiency Semi Supervised Learning Practical Method Softmax Function Temporal Moment Outlier Exposure Multiple Constraint

April 11, 2024

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck
Nathan Godey, Éric de la Clergerie, Benoît Sagot
Large Language Model Language Model Neural Network Softmax Function Model Degeneracy

April 5, 2024

Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers
Andy Yang, David Chiang
Transformer Megatron Decepticons Temporal Logic Softmax Function Transformer Language Model General Bound Behavior Expressivity Style Masked Attention

March 5, 2024

TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax
Tobias Christian Nauen, Sebastian Palacio, Andreas Dengel
Self Attention Complexity Matter Long Sequence Softmax Function Linear Ordered Data Token Interaction

March 3, 2024

Logit Standardization in Knowledge Distillation
Shangquan Sun, Wenqi Ren, Jingzhi Li, Rui Wang, Xiaochun Cao
Knowledge Distillation Softmax Function Logit Distillation Logit Calibration

February 29, 2024

Towards Out-of-Distribution Detection for breast cancer classification in Point-of-Care Ultrasound Imaging
Jennie Karlsson, Marisa Wodrich, Niels Christian Overgaard, Freja Sahlin, Kristina Lång, Anders Heyden, Ida Arvidsson
Distribution Detection Deep Ensemble Softmax Function Ensemble Method Distribution Sample Breast Cancer Classification Point of Care

February 25, 2024

PIDformer: Transformer Meets Control Theory
Tam Nguyen, César A. Uribe, Tan M. Nguyen, Richard G. Baraniuk
Transformer Based Transformer Architecture Softmax Function Input Perturbation Control Theory Baseline Transformer

February 19, 2024

Is It a Free Lunch for Removing Outliers during Pretraining?
Baohao Liao, Christof Monz
Large Language Model Softmax Function Multiplier Free Quantization Causal Language Free Lunch Structural Outlier

Softmax Function

Papers

Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment

Attention in SRAM on Tenstorrent Grayskull

SALT: Introducing a Framework for Hierarchical Segmentations in Medical Imaging using Softmax for Arbitrary Label Trees

What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?

On The Statistical Representation Properties Of The Perturb-Softmax And The Perturb-Argmax Probability Distributions

Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization

MultiMax: Sparse and Multi-Modal Attention Learning

Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts

Reward-Punishment Reinforcement Learning with Maximum Entropy

Binary Hypothesis Testing for Softmax Models and Leverage Score Models

Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond

Geometric Insights into Focal Loss: Reducing Curvature for Enhanced Model Calibration

A Method of Moments Embedding Constraint and its Application to Semi-Supervised Learning

Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers

TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax

Logit Standardization in Knowledge Distillation

Towards Out-of-Distribution Detection for breast cancer classification in Point-of-Care Ultrasound Imaging

PIDformer: Transformer Meets Control Theory

Is It a Free Lunch for Removing Outliers during Pretraining?