Discrete Speech Unit

Discrete speech units (DSUs) represent speech as a sequence of discrete tokens, aiming to improve efficiency and performance in various speech processing tasks. Current research focuses on optimizing DSU selection criteria for applications like speech-to-speech translation and automatic speech recognition, often employing self-supervised learning and transformer-based architectures. This approach offers potential for more compact and efficient models, particularly beneficial for resource-constrained environments and real-time applications, while also improving the quality and robustness of speech synthesis and translation systems. The effectiveness of DSUs is being rigorously evaluated across diverse languages and tasks, leading to advancements in both fundamental understanding of speech representation and practical applications.

Papers