Harmful Content
Harmful content generation and detection in large language models (LLMs) and text-to-image diffusion models is a rapidly evolving research area focused on mitigating the risks of bias, toxicity, and misinformation. Current research emphasizes developing methods to prevent harmful outputs through techniques like attention re-weighting, prompt engineering, and unlearning harmful knowledge, often employing multimodal approaches and continual learning frameworks. This work is crucial for ensuring the responsible development and deployment of AI systems, impacting both the safety of online environments and the ethical considerations surrounding AI development.
Papers
April 2, 2024
March 28, 2024
March 24, 2024
March 14, 2024
February 29, 2024
February 27, 2024
February 21, 2024
February 14, 2024
February 1, 2024
December 2, 2023
November 14, 2023
November 3, 2023
September 29, 2023
September 12, 2023
August 26, 2023
July 12, 2023
May 30, 2023
May 28, 2023