Adversarial Audio

Adversarial audio research focuses on creating subtly altered audio files—imperceptible to humans but designed to fool automatic speech recognition (ASR) and speaker verification (SV) systems. Current research explores various attack methods, including those leveraging neural networks (e.g., Hammerstein models) to generate perturbations, linguistic features to manipulate transcriptions efficiently, and techniques to create universal adversarial segments affecting multiple models. This field is crucial for understanding and mitigating security vulnerabilities in voice-controlled devices and biometric systems, impacting the development of robust and trustworthy AI applications.

Papers