Paper ID: 2405.16630
Bayesian Inference with Deep Weakly Nonlinear Networks
Boris Hanin, Alexander Zlokapa
We show at a physics level of rigor that Bayesian inference with a fully connected neural network and a shaped nonlinearity of the form $\phi(t) = t + \psi t^3/L$ is (perturbatively) solvable in the regime where the number of training datapoints $P$ , the input dimension $N_0$, the network layer widths $N$, and the network depth $L$ are simultaneously large. Our results hold with weak assumptions on the data; the main constraint is that $P < N_0$. We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature. We report the following results from the first-order computation: 1. When the width $N$ is much larger than the depth $L$ and training set size $P$, neural network Bayesian inference coincides with Bayesian inference using a kernel. The value of $\psi$ determines the curvature of a sphere, hyperbola, or plane into which the training data is implicitly embedded under the feature map. 2. When $LP/N$ is a small constant, neural network Bayesian inference departs from the kernel regime. At zero temperature, neural network Bayesian inference is equivalent to Bayesian inference using a data-dependent kernel, and $LP/N$ serves as an effective depth that controls the extent of feature learning. 3. In the restricted case of deep linear networks ($\psi=0$) and noisy data, we show a simple data model for which evidence and generalization error are optimal at zero temperature. As $LP/N$ increases, both evidence and generalization further improve, demonstrating the benefit of depth in benign overfitting.
Submitted: May 26, 2024