Jina Embeddings
Jina embeddings are vector representations of data, primarily text and images, designed to capture semantic meaning and relationships for improved information retrieval and downstream tasks. Current research focuses on enhancing embedding quality through novel loss functions (e.g., SimO loss for fine-grained contrastive learning), developing efficient architectures like decoupled embeddings for handling large datasets and multilingual contexts, and exploring non-Euclidean spaces (e.g., hyperbolic space) to better represent complex relationships. These advancements are improving performance in diverse applications, including recommendation systems, question answering, and even cybersecurity by enabling more accurate similarity searches and more effective model training.
Papers
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
Michael Günther, Jackmin Ong, Isabelle Mohr, Alaeddine Abdessalem, Tanguy Abel, Mohammad Kalim Akram, Susana Guzman, Georgios Mastrapas, Saba Sturua, Bo Wang, Maximilian Werk, Nan Wang, Han Xiao
Chain-of-Thought Embeddings for Stance Detection on Social Media
Joseph Gatto, Omar Sharif, Sarah Masud Preum
Community Detection Guarantees Using Embeddings Learned by Node2Vec
Andrew Davison, S. Carlyle Morgan, Owen G. Ward
Uncovering Meanings of Embeddings via Partial Orthogonality
Yibo Jiang, Bryon Aragam, Victor Veitch
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions
Florian Lux, Pascal Tilli, Sarina Meyer, Ngoc Thang Vu