Jina Embeddings
Jina embeddings are vector representations of data, primarily text and images, designed to capture semantic meaning and relationships for improved information retrieval and downstream tasks. Current research focuses on enhancing embedding quality through novel loss functions (e.g., SimO loss for fine-grained contrastive learning), developing efficient architectures like decoupled embeddings for handling large datasets and multilingual contexts, and exploring non-Euclidean spaces (e.g., hyperbolic space) to better represent complex relationships. These advancements are improving performance in diverse applications, including recommendation systems, question answering, and even cybersecurity by enabling more accurate similarity searches and more effective model training.
Papers
LowREm: A Repository of Word Embeddings for 87 Low-Resource Languages Enhanced with Multilingual Graph Knowledge
Daniil Gurgurov, Rishu Kumar, Simon Ostermann
Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code
Gary A. McCully, John D. Hastings, Shengjie Xu, Adam Fortier
Zero-Shot Whole Slide Image Retrieval in Histopathology Using Embeddings of Foundation Models
Saghir Alfasly, Peyman Nejat, Ghazal Alabtah, Sobhan Hemati, Krishna Rani Kalari, H.R. Tizhoosh
The role of data embedding in quantum autoencoders for improved anomaly detection
Jack Y. Araz, Michael Spannowsky