Efficient Supervised Training of Audio Transformers for Music Representation Learning [2309.16418]