Dialogue Evaluation
Dialogue evaluation aims to automatically assess the quality of conversations generated by AI systems, striving to align automated scores with human judgments of factors like coherence, fluency, and relevance. Current research heavily utilizes large language models (LLMs), often fine-tuned or prompted for specific evaluation tasks, to create automated metrics and datasets for benchmarking dialogue systems across multiple languages and domains. This field is crucial for advancing the development of more human-like and effective conversational AI, impacting both research methodologies and the practical deployment of chatbots and other dialogue agents in various applications.
Papers
October 23, 2024
September 3, 2024
August 20, 2024
July 16, 2024
June 25, 2024
June 17, 2024
June 5, 2024
May 24, 2024
April 1, 2024
January 10, 2024
December 24, 2023
December 21, 2023
September 30, 2023
September 15, 2023
September 14, 2023
August 31, 2023
May 23, 2023
April 6, 2023
December 18, 2022