Paper ID: 2502.05475 • Published Feb 8, 2025
You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
Simon Pepin Lehalleur, Jesse Hoogland, Matthew Farrugia-Roberts, Susan Wei, Alexander Gietelink Oldenziel, George Wang, Liam...
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
In this position paper, we argue that understanding the relation between
structure in the data distribution and structure in trained models is central
to AI alignment. First, we discuss how two neural networks can have equivalent
performance on the training set but compute their outputs in essentially
different ways and thus generalise differently. For this reason, standard
testing and evaluation are insufficient for obtaining assurances of safety for
widely deployed generally intelligent systems. We argue that to progress beyond
evaluation to a robust mathematical science of AI alignment, we need to develop
statistical foundations for an understanding of the relation between structure
in the data distribution, internal structure in models, and how these
structures underlie generalisation.