Response Evaluation

Response evaluation focuses on assessing the quality and appropriateness of generated text, particularly in the context of dialogue systems and other AI applications. Current research emphasizes automated evaluation methods, leveraging large language models (LLMs) and techniques like reinforcement learning to rank and select responses, often incorporating discriminative models or incorporating human-like judgment criteria such as interlocutor awareness and dialogue continuity. These advancements aim to improve the efficiency and effectiveness of training AI models by reducing reliance on expensive human annotation while simultaneously enhancing the quality and user experience of AI-generated conversations and other outputs.

Papers

October 13, 2024

EasyJudge: an Easy-to-use Tool for Comprehensive Response Evaluation of LLMs
Yijie Li, Yuan Sun
Large Language Model Medical LLM Model Evaluation LLM Evaluation Closed Source Model Response Evaluation

August 16, 2024

The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation
Samee Arif, Sualeha Farid, Abdul Hameed Azeemi, Awais Athar, Agha Ali Raza
Large Language Model Response Generation Preference Datasets Synthetic Preference Response Evaluation

May 2, 2024

D2PO: Discriminator-Guided DPO with Response Evaluation Models
Prasann Singhal, Nathan Lambert, Scott Niekum, Tanya Goyal, Greg Durrett
Language Model Supervised Fine Tuning Response Evaluation

January 4, 2024

Rethinking Response Evaluation from Interlocutor's Eye for Open-Domain Dialogue Systems
Yuma Tsuta, Naoki Yoshinaga, Shoetsu Sato, Masashi Toyoda
Open Domain Dialogue System Human Eye Response Evaluation Effective Pragmatic Interlocutor

October 2, 2023

Using Reinforcement Learning to Optimize Responses in Care Processes: A Case Study on Aggression Incidents
Bart J. Verhoef, Xixi Lu
Reinforcement Learning Case Study Markov Decision Process Q Learning Optimal Policy Communal Violence Healthcare Workflow High Performing Policy Response Evaluation

August 16, 2023

Physics Informed Recurrent Neural Networks for Seismic Response Evaluation of Nonlinear Systems
Faisal Nissar Malik, James Ricles, Masoud Yari, Malik Arsala Nissar
Nonlinear Dynamic Response Generation Structural Characterization Earthquake Engineering Structural Engineering Structural Analysis Response Evaluation

February 9, 2023

A Transformer-based Response Evaluator for Open-Domain Spoken Conversation
Vrindavan Harrison, Rishi Rajasekaran, Marilyn Walker
Open Domain Open Domain Dialogue Open Domain Dialogue System Dialogue Coherence Response Evaluation

June 10, 2022

Generate, Evaluate, and Select: A Dialogue System with a Response Evaluator for Diversity-Aware Response Generation
Ryoma Sakaeda, Daisuke Kawahara
Global Evaluation Dialogue System Response Generation Generate Quick Diverse Response Response Evaluation

May 19, 2022

Target-Guided Dialogue Response Generation Using Commonsense and Data Augmentation
Prakhar Gupta, Harsh Jhamtani, Jeffrey P. Bigham
Data Augmentation Dialogue System Commonsense Knowledge Response Generation Dialogue Response Generation Response Evaluation

Response Evaluation

Papers

EasyJudge: an Easy-to-use Tool for Comprehensive Response Evaluation of LLMs

The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation

D2PO: Discriminator-Guided DPO with Response Evaluation Models

Rethinking Response Evaluation from Interlocutor's Eye for Open-Domain Dialogue Systems

Using Reinforcement Learning to Optimize Responses in Care Processes: A Case Study on Aggression Incidents

Physics Informed Recurrent Neural Networks for Seismic Response Evaluation of Nonlinear Systems

A Transformer-based Response Evaluator for Open-Domain Spoken Conversation

Generate, Evaluate, and Select: A Dialogue System with a Response Evaluator for Diversity-Aware Response Generation

Target-Guided Dialogue Response Generation Using Commonsense and Data Augmentation