Spectrogram Based
Spectrogram-based analysis focuses on representing audio signals as visual time-frequency maps to facilitate efficient processing and interpretation. Current research emphasizes leveraging deep learning architectures, such as convolutional neural networks (CNNs), vision transformers (ViTs), and recurrent neural networks (RNNs), often combined with generative adversarial networks (GANs) for tasks like anomaly detection, classification, and source separation. These advancements improve accuracy and efficiency in diverse applications, including speech recognition, music information retrieval, and medical signal processing, by enabling the extraction of complex temporal and spectral features from audio data.
Papers
QTI Submission to DCASE 2021: residual normalization for device-imbalanced acoustic scene classification with efficient design
Byeonggeun Kim, Seunghan Yang, Jangho Kim, Simyung Chang
Algorithms for audio inpainting based on probabilistic nonnegative matrix factorization
Ondřej Mokrý, Paul Magron, Thomas Oberlin, Cédric Févotte