Sentence Embeddings
Sentence embeddings represent sentences as dense vectors, aiming to capture their semantic meaning for various natural language processing tasks. Current research focuses on improving embedding quality through techniques like contrastive learning, domain adaptation (especially for low-resource languages), and exploring the internal structure of embeddings to better understand how linguistic information is encoded. These advancements are significant because effective sentence embeddings are crucial for applications ranging from semantic search and text classification to machine translation and recommendation systems.
Papers
An Incremental Clustering Baseline for Event Detection on Twitter
Marjolaine Ray (Lattice), Qi Wang (Lattice), Frédérique Mélanie-Becquet (Lattice), Thierry Poibeau (Lattice), Béatrice Mazoyer (médialab)
Token Prepending: A Training-Free Approach for Eliciting Better Sentence Embeddings from LLMs
Yuchen Fu, Zifeng Cheng, Zhiwei Jiang, Zhonghui Wang, Yafeng Yin, Zhengliang Li, Qing Gu
Leveraging Estimated Transferability Over Human Intuition for Model Selection in Text Ranking
Jun Bai, Zhuofan Chen, Zhenzi Li, Hanhua Hong, Jianfei Zhang, Chen Li, Chenghua Lin, Wenge Rong
Mitigating Semantic Leakage in Cross-lingual Embeddings via Orthogonality Constraint
Dayeon Ki, Cheonbok Park, Hyunjoong Kim