Black Box
"Black box" refers to systems whose internal workings are opaque, hindering understanding and analysis. Current research focuses on methods to analyze and mitigate the limitations of black-box models, particularly deep neural networks, across diverse applications like code generation, robot design, and autonomous systems. Key approaches involve developing surrogate models, employing novel optimization techniques, and designing explainable AI (XAI) methods to enhance interpretability and trustworthiness. This research is crucial for ensuring the safety, reliability, and fairness of increasingly prevalent AI systems in various fields.
Papers
FedDTPT: Federated Discrete and Transferable Prompt Tuning for Black-Box Large Language Models
Jiaqi Wu, Simin Chen, Yuzhe Yang, Yijiang Li, Shiyue Hou, Rui Jing, Zehua Wang, Wei Chen, Zijian Tian
Black-Box Forgetting
Yusuke Kuwana, Yuta Goto, Takashi Shibata, Go Irie
KAN-AD: Time Series Anomaly Detection with Kolmogorov-Arnold Networks
Quan Zhou, Changhua Pei, Fei Sun, Jing Han, Zhengwei Gao, Dan Pei, Haiming Zhang, Gaogang Xie, Jianhui Li
Human-Readable Programs as Actors of Reinforcement Learning Agents Using Critic-Moderated Evolution
Senne Deproost, Denis Steckelmacher, Ann Nowé
Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models
Shaobo Wang, Hongxuan Tang, Mingyang Wang, Hongrui Zhang, Xuyang Liu, Weiya Li, Xuming Hu, Linfeng Zhang
Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring
Honglin Mu, Han He, Yuxin Zhou, Yunlong Feng, Yang Xu, Libo Qin, Xiaoming Shi, Zeming Liu, Xudong Han, Qi Shi, Qingfu Zhu, Wanxiang Che
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Yunhan Zhao, Xiang Zheng, Lin Luo, Yige Li, Xingjun Ma, Yu-Gang Jiang
S$^4$ST: A Strong, Self-transferable, faSt, and Simple Scale Transformation for Transferable Targeted Attack
Yongxiang Liu, Bowen Peng, Li Liu, Xiang Li
BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
Xinyuan Wang, Victor Shea-Jay Huang, Renmiao Chen, Hao Wang, Chengwei Pan, Lei Sha, Minlie Huang