Comparative study of Transformer and LSTM Network with attention mechanism on Image Captioning [2303.02648]