Spectrogram Based

Spectrogram-based analysis focuses on representing audio signals as visual time-frequency maps to facilitate efficient processing and interpretation. Current research emphasizes leveraging deep learning architectures, such as convolutional neural networks (CNNs), vision transformers (ViTs), and recurrent neural networks (RNNs), often combined with generative adversarial networks (GANs) for tasks like anomaly detection, classification, and source separation. These advancements improve accuracy and efficiency in diverse applications, including speech recognition, music information retrieval, and medical signal processing, by enabling the extraction of complex temporal and spectral features from audio data.

Papers