Based Romanization

Based romanization, the conversion of non-Latin scripts into the Latin alphabet, is a focus of current research due to its implications for data processing and cross-lingual applications. Studies explore the impact of romanization choices on data linkage accuracy, particularly for languages with tonal features like Chinese, and its role in optimizing multilingual automatic speech recognition and large language model performance. This research aims to improve the efficiency and inclusivity of data analysis and natural language processing systems by addressing challenges posed by diverse writing systems and promoting better representation of under-resourced languages.

Papers