Unsafe Input

Unsafe input in artificial intelligence, particularly large language models (LLMs), poses a significant threat to system safety and ethical deployment. Current research focuses on developing and evaluating methods to mitigate this, including techniques like backtracking to undo unsafe generations, fine-tuning models to recognize and avoid unsafe inputs, and employing context-adaptive decoding strategies to guide model outputs towards safer responses. These efforts aim to improve the robustness and reliability of AI systems across various applications, from text generation and code creation to image synthesis, ultimately enhancing the safety and trustworthiness of AI technology.

Papers