Paper ID: 2310.19349

Japanese SimCSE Technical Report

Hayato Tsukagoshi, Ryohei Sasano, Koichi Takeda

We report the development of Japanese SimCSE, Japanese sentence embedding models fine-tuned with SimCSE. Since there is a lack of sentence embedding models for Japanese that can be used as a baseline in sentence embedding research, we conducted extensive experiments on Japanese sentence embeddings involving 24 pre-trained Japanese or multilingual language models, five supervised datasets, and four unsupervised datasets. In this report, we provide the detailed training setup for Japanese SimCSE and their evaluation results.

Submitted: Oct 30, 2023