Paraphrase Dataset
Paraphrase datasets are collections of sentence pairs expressing the same meaning in different words, crucial for training and evaluating natural language processing (NLP) models. Current research focuses on creating larger, higher-quality datasets with improved lexical and syntactic diversity, often leveraging large language models (LLMs) and techniques like back-translation to overcome limitations of existing resources. These improved datasets are vital for advancing NLP tasks such as paraphrase generation, detection, and semantic search, ultimately leading to more robust and accurate applications in various fields.
Papers
November 7, 2024
September 18, 2024
August 19, 2024
April 18, 2024
May 26, 2023
October 24, 2022
October 11, 2022
October 6, 2022
March 25, 2022
January 17, 2022
December 9, 2021