Black Box LLM
Black-box large language models (LLMs), characterized by their opaque internal workings, are a focus of intense research aimed at understanding their vulnerabilities and improving their safety and reliability. Current research explores methods for adversarial attacks (e.g., "jailbreaking" through prompt manipulation), optimization of token usage for efficiency, and techniques for evaluating and improving model robustness and alignment with human values, often employing reinforcement learning and iterative distillation methods. These investigations are crucial for mitigating risks associated with deploying LLMs in real-world applications and for advancing the development of more trustworthy and beneficial AI systems.
Papers
December 6, 2024
November 29, 2024
October 28, 2024
October 2, 2024
October 1, 2024
August 11, 2024
August 6, 2024
May 24, 2024
April 30, 2024
April 26, 2024
April 21, 2024
March 14, 2024
January 18, 2024
January 6, 2024
November 18, 2023
October 25, 2023
June 22, 2023
June 12, 2023
June 5, 2023