Unsafe Content

Unsafe content generation by AI models, particularly large language models (LLMs) and text-to-image models, is a significant area of research focusing on identifying and mitigating the production of harmful or inappropriate outputs. Current efforts utilize various deep learning architectures, including transformers and diffusion models, to detect and prevent unsafe content through methods like attention reweighing, instruction tuning, and reinforcement learning-based rewriting. This research is crucial for ensuring the responsible development and deployment of AI systems, impacting both the safety of online environments and the broader trustworthiness of AI technologies.

Papers