Weapon of Mass Destruction Proxy

Weapon of Mass Destruction Proxy (WMDP) research focuses on developing methods to assess and mitigate the risk of large language models (LLMs) being misused to create or disseminate information related to weapons of mass destruction. Current research explores techniques like proxy-based federated learning to analyze hazardous knowledge while preserving privacy, and develops benchmarks and unlearning algorithms to evaluate and reduce the harmful capabilities of LLMs. This work is crucial for responsible AI development, aiming to improve the safety and security of LLMs while advancing our understanding of how to detect and prevent malicious use.

Papers