Jina Embeddings
Jina embeddings are vector representations of data, primarily text and images, designed to capture semantic meaning and relationships for improved information retrieval and downstream tasks. Current research focuses on enhancing embedding quality through novel loss functions (e.g., SimO loss for fine-grained contrastive learning), developing efficient architectures like decoupled embeddings for handling large datasets and multilingual contexts, and exploring non-Euclidean spaces (e.g., hyperbolic space) to better represent complex relationships. These advancements are improving performance in diverse applications, including recommendation systems, question answering, and even cybersecurity by enabling more accurate similarity searches and more effective model training.
Papers
Leveraging Pre-trained and Transformer-derived Embeddings from EHRs to Characterize Heterogeneity Across Alzheimer's Disease and Related Dementias
Matthew West, Colin Magdamo, Lily Cheng, Yingnan He, Sudeshna Das
Classification and Clustering of Sentence-Level Embeddings of Scientific Articles Generated by Contrastive Learning
Gustavo Bartz Guedes, Ana Estela Antunes da Silva
Preventing Collapse in Contrastive Learning with Orthonormal Prototypes (CLOP)
Huanran Li, Manh Nguyen, Daniel Pimentel-Alarcón
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment
Haitao Li, Qingyao Ai, Xinyan Han, Jia Chen, Qian Dong, Yiqun Liu, Chong Chen, Qi Tian
Beyond Embeddings: The Promise of Visual Table in Visual Reasoning
Yiwu Zhong, Zi-Yuan Hu, Michael R. Lyu, Liwei Wang