Non Speech Audio

Non-speech audio research focuses on extracting meaningful information and patterns from audio signals excluding human speech, aiming to understand and utilize diverse acoustic phenomena. Current research emphasizes developing robust models, often based on transformer architectures and diffusion models, for tasks like audio editing, generation (e.g., from video), and classification across various domains (music, environmental sounds). This field is significant for advancing audio representation learning, enabling applications such as privacy-preserving crowd analysis, improved audio-visual systems, and more sophisticated audio-based content analysis for tasks like hate speech detection.

Papers