Epoch Wise Double Descent

Epoch-wise double descent describes the surprising phenomenon where a machine learning model's generalization performance improves after initially overfitting the training data, exhibiting two distinct phases of decreasing error during training. Current research focuses on understanding this behavior in various architectures, including linear and two-layer neural networks, and deep networks like Transformers, investigating the roles of factors such as input variance, singular values of covariance matrices, and the evolution of learned representations across layers. This research is crucial for improving training strategies, potentially leading to better generalization and more efficient use of computational resources in diverse applications, including time series forecasting.

Papers