Modular Addition

Modular addition, a seemingly simple arithmetic operation, presents a surprisingly challenging problem for machine learning models, particularly at scale. Current research focuses on understanding how neural networks, including transformers and multi-layer perceptrons, learn to perform modular addition, investigating training dynamics, the emergence of interpretable algorithms (like grid and circular structures in embedding spaces), and the phenomenon of "grokking"—a sudden improvement in generalization after an initial period of overfitting. These studies aim to improve the efficiency and interpretability of machine learning models for modular arithmetic, with implications for cryptography and other fields requiring efficient computation of modular operations.

Papers