Leveraging TCN and Transformer for effective visual-audio fusion in continuous emotion recognition [2303.08356]