Timbre Descriptor
Timbre descriptors aim to capture the unique sonic quality of a sound, beyond its pitch and loudness, enabling more expressive and controllable audio synthesis and manipulation. Current research focuses on disentangling timbre from other audio features (like pitch) using techniques such as contrastive learning, variational autoencoders, and generative adversarial networks, often within hierarchical models to capture both global and local variations. This work is significant for improving the realism and expressiveness of speech and music synthesis, with applications ranging from voice conversion and zero-shot speech synthesis to the creation of novel musical instruments and sound effects.
Papers
July 5, 2024
June 9, 2024
January 16, 2024
September 15, 2023
July 18, 2023