Generative AI Safety

Generative AI safety research focuses on mitigating the risks of harmful outputs from powerful models like large language models (LLMs) and text-to-image diffusion models. Current efforts concentrate on developing techniques like fine-grained content moderation (e.g., token-level redaction), probabilistic risk assessment frameworks for copyright and other legal issues, and methods to improve model robustness against adversarial attacks and "jailbreaking" attempts. This field is crucial for responsible AI development, impacting both the trustworthiness of AI systems and the establishment of effective safety guidelines and regulations.

Papers