Dialect Datasets
Dialect datasets are crucial resources for advancing natural language processing (NLP) by enabling the development of language technologies that are inclusive of diverse linguistic variations. Current research focuses on creating and improving these datasets for various languages, encompassing diverse tasks such as dialect identification, speech recognition, and machine translation, often employing transformer-based models and other deep learning architectures. The availability of high-quality, representative dialect datasets is essential for mitigating bias in NLP systems and fostering the development of more equitable and effective language technologies across different communities.
Papers
June 28, 2023
May 26, 2023
May 19, 2023
February 11, 2023
January 30, 2023
December 15, 2022
November 30, 2022
November 10, 2022
September 11, 2022
August 1, 2022
April 3, 2022