Discrete Speech Unit
Discrete speech units (DSUs) represent speech as a sequence of discrete tokens, aiming to improve efficiency and performance in various speech processing tasks. Current research focuses on optimizing DSU selection criteria for applications like speech-to-speech translation and automatic speech recognition, often employing self-supervised learning and transformer-based architectures. This approach offers potential for more compact and efficient models, particularly beneficial for resource-constrained environments and real-time applications, while also improving the quality and robustness of speech synthesis and translation systems. The effectiveness of DSUs is being rigorously evaluated across diverse languages and tasks, leading to advancements in both fundamental understanding of speech representation and practical applications.
Papers
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, Karen Livescu
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Dehua Tao, Daxin Tan, Yu Ting Yeung, Xiao Chen, Tan Lee