Exploring Scaling Laws for Local SGD in Large Language Model Training [2409.13198]