Shuffling Gradient

Shuffling gradient methods, which involve processing data in randomized order rather than sequentially, are a crucial class of optimization algorithms widely used in machine learning. Current research focuses on improving the theoretical understanding of their convergence properties, particularly for non-convex functions and in the context of specific architectures like Vision Mambo models, and analyzing the impact of shuffling on privacy guarantees in differentially private stochastic gradient descent. These investigations aim to enhance the efficiency and reliability of training large-scale models while addressing practical challenges like overfitting and training instability caused by interactions with techniques such as batch normalization. The results have significant implications for both theoretical optimization and the practical development of efficient and robust machine learning systems.

Papers