Timbre Descriptor

Timbre descriptors aim to capture the unique sonic quality of a sound, beyond its pitch and loudness, enabling more expressive and controllable audio synthesis and manipulation. Current research focuses on disentangling timbre from other audio features (like pitch) using techniques such as contrastive learning, variational autoencoders, and generative adversarial networks, often within hierarchical models to capture both global and local variations. This work is significant for improving the realism and expressiveness of speech and music synthesis, with applications ranging from voice conversion and zero-shot speech synthesis to the creation of novel musical instruments and sound effects.

Papers