Unsafe Content
Unsafe content generation by AI models, particularly large language models (LLMs) and text-to-image models, is a significant area of research focusing on identifying and mitigating the production of harmful or inappropriate outputs. Current efforts utilize various deep learning architectures, including transformers and diffusion models, to detect and prevent unsafe content through methods like attention reweighing, instruction tuning, and reinforcement learning-based rewriting. This research is crucial for ensuring the responsible development and deployment of AI systems, impacting both the safety of online environments and the broader trustworthiness of AI technologies.
Papers
January 7, 2025
November 10, 2024
October 24, 2024
October 16, 2024
October 6, 2024
October 4, 2024
October 1, 2024
September 27, 2024
September 17, 2024
June 20, 2024
June 5, 2024
May 24, 2024
April 10, 2024
March 27, 2024
March 16, 2024
December 3, 2023
November 19, 2023
October 10, 2023
September 20, 2023