Visual-Semantic Transformer for Scene Text Recognition [2112.00948]