Jina Embeddings
Jina embeddings are vector representations of data, primarily text and images, designed to capture semantic meaning and relationships for improved information retrieval and downstream tasks. Current research focuses on enhancing embedding quality through novel loss functions (e.g., SimO loss for fine-grained contrastive learning), developing efficient architectures like decoupled embeddings for handling large datasets and multilingual contexts, and exploring non-Euclidean spaces (e.g., hyperbolic space) to better represent complex relationships. These advancements are improving performance in diverse applications, including recommendation systems, question answering, and even cybersecurity by enabling more accurate similarity searches and more effective model training.
Papers
Transformers Can Do Arithmetic with the Right Embeddings
Sean McLeish, Arpit Bansal, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Jonas Geiping, Avi Schwarzschild, Tom Goldstein
Dual VC Dimension Obstructs Sample Compression by Embeddings
Zachary Chase, Bogdan Chornomaz, Steve Hanneke, Shay Moran, Amir Yehudayoff
Node Similarities under Random Projections: Limits and Pathological Cases
Tvrtko Tadić, Cassiano Becker, Jennifer Neville
Fuse after Align: Improving Face-Voice Association Learning via Multimodal Encoder
Chong Peng, Liqiang He, Dan Su
Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformers
Doron Haviv, Russell Zhang Kunes, Thomas Dougherty, Cassandra Burdziak, Tal Nawy, Anna Gilbert, Dana Pe'er