Neural Speech

Neural speech coding aims to compress and reconstruct speech signals using deep learning models, prioritizing high fidelity at low bitrates for efficient communication. Current research emphasizes improving model efficiency (e.g., through smaller architectures like ConvMixers and optimized quantization techniques such as scalar quantization), robustness to noise and packet loss (via methods like GANs and feature-domain packet loss concealment), and personalization for enhanced quality and reduced complexity. These advancements have significant implications for real-time communication systems, enabling high-quality speech transmission in bandwidth-constrained environments and applications like VoIP and low-power devices.

Papers

November 22, 2022

Disentangled Feature Learning for Real-Time Neural Speech Coding
Xue Jiang, Xiulian Peng, Yuan Zhang, Yan Lu
Neural Audio Neural Speech Low Bitrate Self Supervised Speech Representation Model

November 4, 2022

Neural Feature Predictor and Discriminative Residual Coding for Low-Bitrate Speech Coding
Haici Yang, Wootaek Lim, Minje Kim
Neural Vocoder Residual Image Neural Speech Low Bitrate Residual Coding

July 18, 2022

Latent-Domain Predictive Neural Speech Coding
Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu
Low Latency Predictive Coding Neural Audio Neural Speech Vec Tok Codec Speech Codec

July 7, 2022

July 5, 2022

Ultra-Low-Bitrate Speech Coding with Pretrained Transformers
Ali Siahkoohi, Michael Chinen, Tom Denton, W. Bastiaan Kleijn, Jan Skoglund
Pre Trained Transformer Neural Speech High Fidelity Speech Transformer Embeddings Low Bitrate

July 3, 2022

Towards Error-Resilient Neural Speech Coding
Huaying Xue, Xiulian Peng, Xue Jiang, Yan Lu
Neural Audio Neural Codec Neural Speech Packet Loss Concealment

January 24, 2022

End-to-End Neural Speech Coding for Real-Time Communications
Xue Jiang, Xiulian Peng, Chengyu Zheng, Huaying Xue, Yuan Zhang, Yan Lu
Real Time Speech Enhancement Neural Speech Audio Coding Temporal Filter Neural End 2 End Speech

January 15, 2022

ConvMixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-field Keyword Spotting
Dianwen Ng, Yunqi Chen, Biao Tian, Qiang Fu, Eng Siong Chng
Curriculum Learning Attention Module Keyword Spotting Embracing CompAct Noise Robustness Neural Speech ConvMixer Model Interactive Deep Learning