Jina Embeddings
Jina embeddings are vector representations of data, primarily text and images, designed to capture semantic meaning and relationships for improved information retrieval and downstream tasks. Current research focuses on enhancing embedding quality through novel loss functions (e.g., SimO loss for fine-grained contrastive learning), developing efficient architectures like decoupled embeddings for handling large datasets and multilingual contexts, and exploring non-Euclidean spaces (e.g., hyperbolic space) to better represent complex relationships. These advancements are improving performance in diverse applications, including recommendation systems, question answering, and even cybersecurity by enabling more accurate similarity searches and more effective model training.
Papers
Mapping the Multiverse of Latent Representations
Jeremy Wayland, Corinna Coupette, Bastian Rieck
Vaccine: Perturbation-aware Alignment for Large Language Models against Harmful Fine-tuning Attack
Tiansheng Huang, Sihao Hu, Ling Liu
Salsa Fresca: Angular Embeddings and Pre-Training for ML Attacks on Learning With Errors
Samuel Stevens, Emily Wenger, Cathy Li, Niklas Nolte, Eshika Saxena, François Charton, Kristin Lauter