Dialog Evaluation

Dialog evaluation aims to automatically assess the quality of conversations generated by AI systems, striving to align automated scores with human judgments of conversational fluency, engagement, and overall quality. Current research focuses on developing more sophisticated metrics beyond simple word overlap, exploring techniques like conditional mutual information, psychologically-grounded metrics (e.g., measuring emotional expression and empathy), and leveraging large language models for automated evaluation via prompting. These advancements are crucial for improving the development and deployment of more human-like and socially responsible conversational AI systems.

Papers