Fake Speech Detection

Fake speech detection (FSD) aims to distinguish authentic speech from synthetic or manipulated audio, addressing concerns about malicious use of increasingly sophisticated voice technologies. Current research focuses on developing robust deep learning models, often employing Res2Net architectures or incorporating techniques like knowledge distillation and data augmentation (e.g., Specmix and Freqmix) to improve accuracy and generalization across diverse audio conditions. These advancements are crucial for combating audio deepfakes and enhancing the security and trustworthiness of voice-based systems in various applications, including telephony and music authentication.

Papers