Speech Driven

Speech-driven research focuses on developing computational models that effectively process and understand spoken language, encompassing tasks like speech recognition, speaker identification, and emotion detection. Current research emphasizes multi-task learning frameworks, often employing transformer-based architectures and diffusion models, to improve the robustness and efficiency of these models across diverse scenarios and languages. This field is crucial for advancing human-computer interaction, improving accessibility for individuals with communication challenges, and enabling more sophisticated applications in areas like personalized healthcare and virtual assistants.

Papers