Discrete Speech Representation

Discrete speech representation focuses on encoding speech signals into discrete units, aiming to improve efficiency and performance in various speech processing tasks. Current research emphasizes the development and application of these representations within transformer-based models, including autoregressive and non-autoregressive architectures, often leveraging self-supervised learning techniques and incorporating techniques like quantization and hierarchical structures. This approach shows promise for enhancing automatic speech recognition, speech synthesis, voice conversion, and cross-lingual speech processing, particularly in low-resource scenarios and for improving robustness to noise and variations in speech.

Papers