Text Representation
Text representation focuses on encoding textual data into numerical formats suitable for machine learning, aiming to capture semantic meaning and contextual information effectively. Current research emphasizes leveraging large language models (LLMs) to generate interpretable features, improve dense retrieval by augmenting text, and enhance multimodal learning by aligning text with other modalities like speech and images. These advancements are significant for various applications, including improved information retrieval, more accurate sentiment analysis, and enhanced performance in tasks like stock prediction and medical diagnosis.
Papers
Greenback Bears and Fiscal Hawks: Finance is a Jungle and Text Embeddings Must Adapt
Peter Anderson, Mano Vikash Janardhanan, Jason He, Wei Cheng, Charlie Flanagan
Large Language Model in Medical Informatics: Direct Classification and Enhanced Text Representations for Automatic ICD Coding
Zeyd Boukhers, AmeerAli Khan, Qusai Ramadan, Cong Yang