New Benchmark
Recent research focuses on developing comprehensive benchmarks for evaluating large language models (LLMs) and other machine learning models across diverse tasks, including economic games, financial question answering, graph analysis, and robotic manipulation. These benchmarks aim to standardize evaluation methodologies, address issues like fairness and robustness, and quantify uncertainty in model performance, using various architectures such as transformers and graph neural networks. The resulting standardized evaluations and datasets are crucial for advancing the field by facilitating more rigorous comparisons of models and identifying areas needing improvement, ultimately leading to more reliable and effective AI systems across numerous applications.
Papers
Flexible-Modal Face Anti-Spoofing: A Benchmark
Zitong Yu, Ajian Liu, Chenxu Zhao, Kevin H. M. Cheng, Xu Cheng, Guoying Zhao
Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks
Jingyan Zhou, Jiawen Deng, Fei Mi, Yitong Li, Yasheng Wang, Minlie Huang, Xin Jiang, Qun Liu, Helen Meng
Universal Adversarial Examples in Remote Sensing: Methodology and Benchmark
Yonghao Xu, Pedram Ghamisi
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
Jiaxi Gu, Xiaojun Meng, Guansong Lu, Lu Hou, Minzhe Niu, Xiaodan Liang, Lewei Yao, Runhui Huang, Wei Zhang, Xin Jiang, Chunjing Xu, Hang Xu
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
P. Mihajlik, A. Balog, T. E. Gráczi, A. Kohári, B. Tarján, K. Mády
Team Cogitat at NeurIPS 2021: Benchmarks for EEG Transfer Learning Competition
Stylianos Bakas, Siegfried Ludwig, Konstantinos Barmpas, Mehdi Bahri, Yannis Panagakis, Nikolaos Laskaris, Dimitrios A. Adamos, Stefanos Zafeiriou
AlphaDesign: A graph protein design method and benchmark on AlphaFoldDB
Zhangyang Gao, Cheng Tan, Stan Z. Li