Weapon of Mass Destruction Proxy
Weapon of Mass Destruction Proxy (WMDP) research focuses on developing methods to assess and mitigate the risk of large language models (LLMs) being misused to create or disseminate information related to weapons of mass destruction. Current research explores techniques like proxy-based federated learning to analyze hazardous knowledge while preserving privacy, and develops benchmarks and unlearning algorithms to evaluate and reduce the harmful capabilities of LLMs. This work is crucial for responsible AI development, aiming to improve the safety and security of LLMs while advancing our understanding of how to detect and prevent malicious use.
Papers
November 18, 2024
October 25, 2024
October 9, 2024
September 12, 2024
July 24, 2024
June 4, 2024
March 5, 2024
February 15, 2024
October 12, 2023
September 19, 2023
June 14, 2023
November 2, 2022