Robust Speech
Robust speech research aims to develop speech processing systems—like speech recognition and synthesis—that perform reliably across diverse and challenging conditions, including noisy environments and varied speaker characteristics. Current efforts focus on improving model robustness through techniques such as incorporating convolutional layers alongside transformers in large language models, leveraging massive weakly-supervised datasets for training, and refining neural codec language models to achieve human-parity performance in zero-shot text-to-speech. This work is crucial for advancing applications like assistive technologies for individuals with communication impairments and creating more inclusive and reliable human-computer interaction systems.