Synthetic Speech Detection
Synthetic speech detection aims to distinguish artificially generated speech from human speech, combating the increasing threat of audio deepfakes and fraudulent voice impersonation. Current research heavily utilizes deep learning, employing architectures like Transformers, ResNets, and variations thereof, often incorporating techniques such as multi-head self-attention and feature fusion to improve accuracy and robustness against various synthesis methods and noise. This field is crucial for safeguarding against financial fraud, misinformation campaigns, and identity theft, driving ongoing efforts to develop more generalizable and interpretable detection models that are resilient to adversarial attacks and compression artifacts.
Papers
Detecting Synthetic Speech Manipulation in Real Audio Recordings
Md Hafizur Rahman, Martin Graciarena, Diego Castan, Chris Cobo-Kroenke, Mitchell McLaren, Aaron Lawson
Open Challenges in Synthetic Speech Detection
Luca Cuccovillo, Christoforos Papastergiopoulos, Anastasios Vafeiadis, Artem Yaroshchuk, Patrick Aichroth, Konstantinos Votis, Dimitrios Tzovaras