Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation [2303.16541]