Staleness Problem
The "staleness problem" arises in distributed machine learning, particularly federated learning and asynchronous training, where delayed or outdated model updates from participating nodes hinder efficient model convergence. Current research focuses on mitigating staleness through techniques like weighted aggregation of updates based on staleness, optimizer-dependent weight prediction to maintain consistency, and gradient inversion to reconstruct non-stale updates from stale information. Addressing staleness is crucial for improving the efficiency and accuracy of distributed machine learning systems, impacting applications ranging from large-scale model training to resource-constrained mobile devices.
Papers
June 17, 2024
June 5, 2024
April 1, 2024
December 1, 2023
September 24, 2023
January 16, 2023
September 6, 2022
April 29, 2022