Speech Discrete Token

Speech discrete tokens represent speech signals as sequences of discrete units, enabling efficient processing and bridging the gap between speech and text processing. Current research focuses on improving the performance and universality of these tokens across various tasks, including automatic speech recognition (ASR), text-to-speech (TTS), and voice conversion (VC), often leveraging self-supervised learning and transformer-based architectures like decoder-only models. This approach shows promise for improving the speed and accuracy of speech processing systems, as well as facilitating multi-lingual and multi-modal applications by leveraging the strengths of large language models. The resulting advancements have significant implications for improving the efficiency and performance of numerous speech technologies.

Papers