Training Instability
Training instability in large-scale machine learning models, particularly deep neural networks and transformers, is a significant challenge hindering reliable model development and deployment. Current research focuses on identifying and mitigating sources of instability, such as numerical precision limitations in algorithms like Adam and Flash Attention, the interplay between optimizers and normalization layers (e.g., Batch Normalization), and the impact of data heterogeneity in federated learning. Addressing these issues is crucial for improving the robustness and efficiency of training, leading to more reliable and accurate models across various applications.
Papers
December 5, 2024
November 8, 2024
June 9, 2024
May 5, 2024
September 25, 2023
July 30, 2023
April 23, 2023
March 16, 2023
February 28, 2023
February 25, 2023
February 24, 2023
February 17, 2023
February 16, 2023
December 20, 2022
July 3, 2022
June 10, 2022