ASR Foundation Model

ASR foundation models are large, pre-trained speech recognition models designed for broad applicability across multiple languages and tasks. Current research focuses on improving their adaptability to new languages, particularly low-resource ones, through techniques like parameter-efficient fine-tuning and methods to mitigate performance degradation on existing languages. These models are also being investigated for their potential in zero-shot audio classification and adapted for specialized applications like spoken language assessment, requiring modifications to their output format and handling of disfluencies. The development of robust and adaptable ASR foundation models significantly impacts both the scientific community through improved benchmarking and dataset creation, and practical applications by enabling more accurate and versatile speech technologies.

Papers