Target Sound Extraction

Target sound extraction (TSE) aims to isolate a desired sound from a mixture, using clues like sound class labels, audio queries, timestamps, or even language descriptions. Current research heavily utilizes deep learning models, including transformers, diffusion probabilistic models, and state-space models, often incorporating pre-trained audio foundation models to improve performance and generalization. This field is significant for its potential applications in assistive hearing technologies, audio editing, and enhancing human-computer interaction by enabling more sophisticated and nuanced audio processing capabilities.

Papers