Harmful Content
Harmful content generation and detection in large language models (LLMs) and text-to-image diffusion models is a rapidly evolving research area focused on mitigating the risks of bias, toxicity, and misinformation. Current research emphasizes developing methods to prevent harmful outputs through techniques like attention re-weighting, prompt engineering, and unlearning harmful knowledge, often employing multimodal approaches and continual learning frameworks. This work is crucial for ensuring the responsible development and deployment of AI systems, impacting both the safety of online environments and the ethical considerations surrounding AI development.
Papers
November 9, 2024
November 6, 2024
October 17, 2024
October 15, 2024
October 11, 2024
October 6, 2024
September 23, 2024
July 22, 2024
July 17, 2024
June 22, 2024
June 21, 2024
June 7, 2024
June 3, 2024
May 31, 2024
May 23, 2024
April 23, 2024
April 8, 2024
April 2, 2024
March 28, 2024