Black Box Language Model

Black-box language models (LLMs) are large language models whose internal workings are opaque to users, posing challenges for understanding their behavior and improving their performance. Current research focuses on developing methods to adapt, analyze, and explain these models without direct access to their internal parameters, employing techniques like prompt engineering, watermarking, and adversarial attacks to probe their capabilities and limitations. This research is crucial for mitigating risks associated with using powerful yet inscrutable AI systems and for advancing the development of more trustworthy and reliable language technologies.

Papers