Acoustic Prompt
Acoustic prompting leverages audio features to guide machine learning models, primarily focusing on improving speech synthesis and analysis tasks. Current research explores using acoustic properties (e.g., pitch, intensity) to generate descriptive prompts for training models, enhancing zero-shot text-to-speech systems, and improving the segmentation of complex objects like surgical instruments. This approach shows promise in improving the accuracy and naturalness of speech generation, as well as enabling more nuanced emotion recognition and object segmentation in various applications, thereby advancing both fundamental understanding and practical capabilities in audio processing and computer vision.
Papers
June 9, 2024
December 22, 2023
October 3, 2023
September 21, 2023
July 4, 2023