Unethical Content

Research on unethical content generated by large language models (LLMs) focuses on identifying and mitigating the production of harmful or biased outputs. Current efforts concentrate on developing robust evaluation methods, including those that leverage psychometric principles and automated testing procedures to detect biases and vulnerabilities in models like GPT-3, GPT-4, and various Llama models. This work is crucial for improving the safety and ethical implications of LLMs, impacting both the development of more responsible AI systems and the broader understanding of AI's societal impact. The ultimate goal is to create LLMs that are less susceptible to generating unethical content, regardless of input manipulation or instruction-centric prompting.

Papers