Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling [2406.05681]