Towards Learning Universal Audio Representations [2111.12124]