Whisper Encoder
Whisper, a large pre-trained speech model, is being extensively adapted for various speech processing tasks beyond its original automatic speech recognition capabilities. Current research focuses on leveraging Whisper's encoder, often in conjunction with other models like LLMs, to improve speaker verification, emotion recognition, and low-resource language processing through techniques such as multi-scale feature aggregation and parameter-efficient fine-tuning. This adaptability demonstrates Whisper's potential as a powerful foundation model for diverse applications, impacting fields like healthcare (suicide risk detection, speech therapy assessment) and human-computer interaction.
Papers
August 28, 2024
August 15, 2024
July 14, 2024
June 19, 2024
June 12, 2024
June 9, 2024
June 6, 2024
February 20, 2024
January 21, 2024
November 15, 2023
September 22, 2023
September 18, 2023
May 28, 2023
May 23, 2023
May 18, 2023