Beyond Scale: The Diversity Coefficient as a Data Quality Metric for Variability in Natural Language Data [2306.13840]