Unsupervised Sentence

Unsupervised sentence embedding aims to create meaningful vector representations of sentences without labeled data, enabling various downstream NLP tasks. Current research focuses on improving the quality of these embeddings by addressing biases (e.g., position bias, word frequency bias) inherent in pre-trained language models through techniques like contrastive learning, data augmentation (including domain-specific augmentation), and debiasing methods. These advancements leverage architectures such as autoencoders and transformers, often incorporating hierarchical or instance-smoothing approaches to enhance semantic representation and reduce noise. The resulting improvements in semantic textual similarity and other tasks have significant implications for various NLP applications, including information retrieval and text classification.

Papers