Multitask Speech

Multitask speech models aim to perform multiple speech-related tasks, such as speech recognition, translation, and synthesis, within a single unified architecture, improving efficiency and generalizability compared to single-task models. Current research focuses on developing efficient inference methods for large multitask models, leveraging techniques like token reduction and knowledge distillation to optimize performance and resource usage. These advancements are significant because they enable more robust and versatile speech processing systems, potentially impacting various applications including virtual assistants, accessibility technologies, and cross-lingual communication.

Papers