Paper ID: 2410.12025
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Sajad Movahedi, Antonio Orvieto, Seyed-Mohsen Moosavi-Dezfooli
In this paper, we propose the $\textit{geometric invariance hypothesis (GIH)}$, which argues that when training a neural network, the input space curvature remains invariant under transformation in certain directions determined by its architecture. Starting with a simple non-linear binary classification problem residing on a plane in a high dimensional space, we observe that while an MLP can solve this problem regardless of the orientation of the plane, this is not the case for a ResNet. Motivated by this example, we define two maps that provide a compact $\textit{architecture-dependent}$ summary of the input space geometry of a neural network and its evolution during training, which we dub the $\textbf{average geometry}$ and $\textbf{average geometry evolution}$, respectively. By investigating average geometry evolution at initialization, we discover that the geometry of a neural network evolves according to the projection of data covariance onto average geometry. As a result, in cases where the average geometry is low-rank (such as in a ResNet), the geometry only changes in a subset of the input space. This causes an architecture-dependent invariance property in input-space curvature, which we dub GIH. Finally, we present extensive experimental results to observe the consequences of GIH and how it relates to generalization in neural networks.
Submitted: Oct 15, 2024