Paper ID: 2406.05666

General Distribution Learning: A theoretical framework for Deep Learning

Binchuan Qi

This paper introduces General Distribution Learning (GD learning), a novel theoretical learning framework designed to address a comprehensive range of machine learning and statistical tasks, including classification, regression, and parameter estimation. GD learning focuses on estimating the true underlying probability distribution of dataset and using models to fit the estimated parameters of the distribution. The learning error in GD learning is thus decomposed into two distinct categories: estimation error and fitting error. The estimation error, which stems from the constraints of finite sampling, limited prior knowledge, and the estimation algorithm's inherent limitations, quantifies the discrepancy between the true distribution and its estimate. The fitting error can be attributed to model's capacity limitation and the performance limitation of the optimization algorithm, which evaluates the deviation of the model output from the fitted objective. To address the challenge of non-convexity in the optimization of learning error, we introduce the standard loss function and demonstrate that, when employing this function, global optimal solutions in non-convex optimization can be approached by minimizing the gradient norm and the structural error. Moreover, we demonstrate that the estimation error is determined by the uncertainty of the estimate $q$, and propose the minimum uncertainty principle to obtain an optimal estimate of the true distribution. We further provide upper bounds for the estimation error, fitting error, and learning error within the GD learning framework. Ultimately, our findings are applied to offer theoretical explanations for several unanswered questions on deep learning, including overparameterization, non-convex optimization, flat minima, dynamic isometry condition and other techniques in deep learning.

Submitted: Jun 9, 2024