Unsafe Input
Unsafe input in artificial intelligence, particularly large language models (LLMs), poses a significant threat to system safety and ethical deployment. Current research focuses on developing and evaluating methods to mitigate this, including techniques like backtracking to undo unsafe generations, fine-tuning models to recognize and avoid unsafe inputs, and employing context-adaptive decoding strategies to guide model outputs towards safer responses. These efforts aim to improve the robustness and reliability of AI systems across various applications, from text generation and code creation to image synthesis, ultimately enhancing the safety and trustworthiness of AI technology.
Papers
October 24, 2024
October 6, 2024
September 22, 2024
July 14, 2024
June 21, 2024
June 18, 2024
May 28, 2024
March 20, 2024
February 19, 2024
February 16, 2024
February 14, 2024
December 30, 2023
January 17, 2023