Speech Waveform

Speech waveform research focuses on understanding and manipulating the raw audio signal of speech, aiming to improve speech processing technologies. Current research emphasizes using deep learning models, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), transformers, and generative adversarial networks (GANs), often applied directly to raw waveforms without intermediate feature extraction, to achieve tasks like speech synthesis, recognition, and enhancement. These advancements have significant implications for applications ranging from improved hearing aids and voice assistants to more accurate forensic speaker identification and the development of more natural-sounding synthetic speech.

Papers