Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions [2407.04416]