Mandarin Speech

Research on Mandarin speech focuses on improving automatic speech recognition (ASR) and text-to-speech (TTS) systems, particularly for challenging scenarios like children's speech, diverse dialects (e.g., Hakka), and noisy environments. Current efforts leverage advanced models such as Conformers, HuBERT, and large language models (LLMs), often incorporating techniques like multi-modal and multi-granularity approaches to enhance accuracy and robustness. These advancements are crucial for developing applications in education, healthcare (e.g., personalized TTS for the speech impaired), and human-computer interaction, while also contributing significantly to language preservation and revitalization efforts for under-resourced dialects.

Papers