Text Representation
Text representation focuses on encoding textual data into numerical formats suitable for machine learning, aiming to capture semantic meaning and contextual information effectively. Current research emphasizes leveraging large language models (LLMs) to generate interpretable features, improve dense retrieval by augmenting text, and enhance multimodal learning by aligning text with other modalities like speech and images. These advancements are significant for various applications, including improved information retrieval, more accurate sentiment analysis, and enhanced performance in tasks like stock prediction and medical diagnosis.
Papers
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
Michael Günther, Jackmin Ong, Isabelle Mohr, Alaeddine Abdessalem, Tanguy Abel, Mohammad Kalim Akram, Susana Guzman, Georgios Mastrapas, Saba Sturua, Bo Wang, Maximilian Werk, Nan Wang, Han Xiao
KeyGen2Vec: Learning Document Embedding via Multi-label Keyword Generation in Question-Answering
Iftitahu Ni'mah, Samaneh Khoshrou, Vlado Menkovski, Mykola Pechenizkiy