Single-stage TTS with Masked Audio Token Modeling and Semantic Knowledge Distillation [2409.11003]