Novel Shielding Mechanism
Novel shielding mechanisms are being developed to enhance the safety and robustness of various AI systems, particularly in reinforcement learning and large language models. Current research focuses on techniques like prompt tuning for LLMs, adaptive shielding using contrastive autoencoders for RL agents, and incorporating human preferences to improve transparency and safety. These advancements are crucial for mitigating risks associated with adversarial attacks, unexpected behaviors, and resource constraints in real-world applications of AI, leading to more reliable and trustworthy systems.
Papers
October 7, 2024
July 3, 2024
May 28, 2024
November 28, 2023
November 23, 2023
November 21, 2023
August 8, 2023
November 28, 2022
July 27, 2022
May 3, 2022
January 20, 2022