Dialect Datasets
Dialect datasets are crucial resources for advancing natural language processing (NLP) by enabling the development of language technologies that are inclusive of diverse linguistic variations. Current research focuses on creating and improving these datasets for various languages, encompassing diverse tasks such as dialect identification, speech recognition, and machine translation, often employing transformer-based models and other deep learning architectures. The availability of high-quality, representative dialect datasets is essential for mitigating bias in NLP systems and fostering the development of more equitable and effective language technologies across different communities.
Papers
October 4, 2024
October 3, 2024
July 4, 2024
June 25, 2024
June 14, 2024
June 11, 2024
May 2, 2024
March 27, 2024
March 16, 2024
December 16, 2023
November 30, 2023
November 5, 2023
November 2, 2023
October 31, 2023
August 1, 2023
July 23, 2023
July 14, 2023
June 28, 2023