Robustness Evaluation Benchmark
Robustness evaluation benchmarks are crucial for assessing the reliability of machine learning models across various domains, including image classification, code generation, and natural language processing tasks like Text-to-SQL. Current research focuses on developing comprehensive benchmarks that evaluate model performance under diverse conditions, such as adversarial attacks, distribution shifts, and noisy or incomplete data, often using convolutional neural networks (CNNs) and transformers. These benchmarks are vital for identifying weaknesses in existing models and guiding the development of more robust and reliable algorithms, ultimately improving the safety and trustworthiness of AI systems in real-world applications. The resulting insights are driving improvements in model training techniques and architectures, leading to more resilient and dependable AI systems.
Papers
ReCode: Robustness Evaluation of Code Generation Models
Shiqi Wang, Zheng Li, Haifeng Qian, Chenghao Yang, Zijian Wang, Mingyue Shang, Varun Kumar, Samson Tan, Baishakhi Ray, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Dan Roth, Bing Xiang
Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation
Xinyu Pi, Bing Wang, Yan Gao, Jiaqi Guo, Zhoujun Li, Jian-Guang Lou