Acoustic Token

Acoustic tokenization represents the process of converting continuous audio signals into discrete units for processing by machine learning models, primarily focusing on improving the performance of audio language models (ALMs). Current research emphasizes developing more effective tokenization methods that better preserve semantic information, often employing transformer-based architectures and exploring techniques like residual vector quantization and mel-filterbank discretization. This work is crucial for advancing various audio applications, including speech recognition, speech synthesis, music generation, and voice conversion, by enabling more accurate and efficient processing of audio data.

Papers