Vietnamese Image Captioning

Vietnamese image captioning research focuses on developing and evaluating models that automatically generate descriptive captions for images in Vietnamese, addressing the scarcity of resources compared to English. Current efforts concentrate on adapting and improving transformer-based architectures, often incorporating convolutional neural networks for enhanced image feature extraction and employing attention mechanisms to refine caption generation. This work is crucial for bridging the language gap in computer vision, enabling applications like automated image annotation and accessibility tools for Vietnamese speakers, and providing valuable benchmarks for low-resource language processing.

Papers