Diverse and Vivid Sound Generation from Text Descriptions [2305.01980]