CodeRAG Bench
CodeRAG-Bench is a new benchmark designed to evaluate the effectiveness of retrieval-augmented generation (RAG) in improving code generation by large language models (LLMs). Current research focuses on identifying scenarios where retrieving external context (like documentation or code examples) significantly benefits code generation, and on understanding the limitations of current retrieval and generation methods. This benchmark is significant because it provides a standardized evaluation framework to advance the development of more robust and efficient code generation techniques, ultimately impacting software development and AI-assisted programming.
Papers
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, Prateek Mittal
CodeRAG-Bench: Can Retrieval Augment Code Generation?
Zora Zhiruo Wang, Akari Asai, Xinyan Velocity Yu, Frank F. Xu, Yiqing Xie, Graham Neubig, Daniel Fried