Text Clustering

Text clustering aims to automatically group similar text documents based on their content, facilitating efficient organization and analysis of large datasets where manual labeling is impractical. Current research emphasizes leveraging large language models (LLMs) for improved embedding generation and cluster interpretation, exploring both unsupervised and supervised approaches, and incorporating techniques like contrastive learning and attention mechanisms to enhance performance. These advancements are improving the accuracy and efficiency of text clustering, with applications ranging from data augmentation in legal contexts to improved information retrieval and resource recommendation in digital libraries.

Papers