Paper ID: 2210.09296

3rd Place Solution for Google Universal Image Embedding

Nobuaki Aoki, Yasumasa Namba

This paper presents the 3rd place solution to the Google Universal Image Embedding Competition on Kaggle. We use ViT-H/14 from OpenCLIP for the backbone of ArcFace, and trained in 2 stage. 1st stage is done with freezed backbone, and 2nd stage is whole model training. We achieve 0.692 mean Precision @5 on private leaderboard. Code available at https://github.com/YasumasaNamba/google-universal-image-embedding

Submitted: Oct 14, 2022