Harmful Content

Harmful content generation and detection in large language models (LLMs) and text-to-image diffusion models is a rapidly evolving research area focused on mitigating the risks of bias, toxicity, and misinformation. Current research emphasizes developing methods to prevent harmful outputs through techniques like attention re-weighting, prompt engineering, and unlearning harmful knowledge, often employing multimodal approaches and continual learning frameworks. This work is crucial for ensuring the responsible development and deployment of AI systems, impacting both the safety of online environments and the ethical considerations surrounding AI development.

Papers