Non Standardized Dialect

Non-standardized dialects pose significant challenges for natural language processing (NLP) due to their variations in orthography, pronunciation, and grammar compared to standard forms. Current research focuses on developing robust methods for dialect identification and evaluation, employing transformer-based models and self-supervised speech models to analyze both textual and audio data. These efforts are crucial for improving the accuracy of downstream NLP tasks like machine translation and speech recognition, and for furthering our understanding of linguistic variation and its sociolinguistic correlates, such as the impact of socioeconomic mixing on language use. The development of comprehensive benchmark datasets and improved evaluation metrics is also a key area of ongoing investigation.

Papers

November 20, 2024

Unification of Balti and trans-border sister dialects in the essence of LLMs and AI Technology
Muhammad Sharif, Jiangyan Yi, Muhammad Shoaib
Medical LLM Anti Unification AI Technology Regional Dialect Non Standardized Dialect

November 30, 2023

Mavericks at NADI 2023 Shared Task: Unravelling Regional Nuances through Dialect Identification using Transformer-based Approach
Vedant Deshpande, Yash Patwardhan, Kshitij Deshpande, Sudeep Mangalvedhekar, Ravindra Murumkar
Speech Recognition Region Specific Shared Task Transformer Based Approach Twitter Dataset Dialect Datasets Dialect Identification Arabic Dialect Identification Non Standardized Dialect

November 28, 2023

A Benchmark for Evaluating Machine Translation Metrics on Dialects Without Standard Orthography
Noëmi Aepli, Chantal Amrhein, Florian Schottmann, Rico Sennrich
New Benchmark Machine Translation Human Translation Regional Dialect Translation Metric Swiss German Non Standardized Dialect

July 19, 2023

When Dialects Collide: How Socioeconomic Mixing Affects Language Use
Thomas Louf, José J. Ramasco, David Sánchez, Márton Karsai
Proxy Metric Social Class Computational Sociolinguistics Non Standardized Dialect

May 19, 2023

North S\'{a}mi Dialect Identification with Self-supervised Speech Models
Sofoklis Kakouros, Katri Hiovain-Asikainen
Prosodic Feature Self Supervised Speech Model Dialect Datasets Spoken Language Non Standardized Dialect

November 6, 2021

Finnish Dialect Identification: The Effect of Audio and Text
Mika Hämäläinen, Khalid Alnajjar, Niko Partanen, Jack Rueter
Text Modality Mixed Effect Audio Driven Dialect Identification Dialect Speaker Non Standardized Dialect