Jina Embeddings
Jina embeddings are vector representations of data, primarily text and images, designed to capture semantic meaning and relationships for improved information retrieval and downstream tasks. Current research focuses on enhancing embedding quality through novel loss functions (e.g., SimO loss for fine-grained contrastive learning), developing efficient architectures like decoupled embeddings for handling large datasets and multilingual contexts, and exploring non-Euclidean spaces (e.g., hyperbolic space) to better represent complex relationships. These advancements are improving performance in diverse applications, including recommendation systems, question answering, and even cybersecurity by enabling more accurate similarity searches and more effective model training.
Papers
Unsupervised Semantic Correspondence Using Stable Diffusion
Eric Hedlin, Gopal Sharma, Shweta Mahajan, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi
Estimating class separability of text embeddings with persistent homology
Kostis Gourgoulias, Najah Ghalyan, Maxime Labonne, Yash Satsangi, Sean Moran, Joseph Sabelja
Is a Prestigious Job the same as a Prestigious Country? A Case Study on Multilingual Sentence Embeddings and European Countries
Jindřich Libovický
Linear Cross-Lingual Mapping of Sentence Embeddings
Oleg Vasilyev, Fumika Isono, John Bohannon
Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation
Di Wu, Christof Monz
Discovering Universal Geometry in Embeddings with ICA
Hiroaki Yamagiwa, Momose Oyama, Hidetoshi Shimodaira
POEM: Polarization of Embeddings for Domain-Invariant Representations
Sang-Yeong Jo, Sung Whan Yoon
3D Rotation and Translation for Hyperbolic Knowledge Graph Embedding
Yihua Zhu, Hidetoshi Shimodaira