Auditing Delphi
Auditing Delphi, and similar large language models (LLMs), focuses on evaluating their performance and biases, particularly concerning controversial topics and moral reasoning. Current research investigates how these models handle complex prompts, analyzing their responses across diverse political viewpoints and social contexts using techniques like reinforcement learning and curriculum learning to improve prompt-following and question generation. This work is crucial for understanding and mitigating potential biases in LLMs, ultimately aiming to improve their reliability and ethical implications in various applications.
Papers
December 7, 2023
October 27, 2023
June 22, 2023
December 20, 2022