Dialogue Evaluation
Dialogue evaluation aims to automatically assess the quality of conversations generated by AI systems, striving to align automated scores with human judgments of factors like coherence, fluency, and relevance. Current research heavily utilizes large language models (LLMs), often fine-tuned or prompted for specific evaluation tasks, to create automated metrics and datasets for benchmarking dialogue systems across multiple languages and domains. This field is crucial for advancing the development of more human-like and effective conversational AI, impacting both research methodologies and the practical deployment of chatbots and other dialogue agents in various applications.
Papers
November 21, 2022
November 19, 2022
October 25, 2022
September 2, 2022
June 3, 2022
March 25, 2022
March 11, 2022
February 14, 2022
December 14, 2021
November 16, 2021