Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment [2211.07275]