Mel Spectrogram
A mel spectrogram is a visual representation of audio, transforming sound into a visual format that highlights frequencies important to human hearing. Current research focuses on improving mel spectrogram generation and manipulation using various deep learning architectures, including variational autoencoders, normalizing flows, diffusion models, and transformers, often applied to tasks like audio compression, speech synthesis, and enhancement. These advancements are driving progress in diverse applications such as speech recognition, music generation, and audio forensics, improving both the quality and efficiency of audio processing techniques. The resulting improvements in audio analysis and synthesis have significant implications for various fields, including assistive technologies and ecological monitoring.
Papers
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
A Comparative Study on Approaches to Acoustic Scene Classification using CNNs
Ishrat Jahan Ananya, Sarah Suad, Shadab Hafiz Choudhury, Mohammad Ashrafuzzaman Khan