BabyLlama-2: Ensemble-Distilled Models Consistently Outperform Teachers With Limited Data [2409.17312]