New Benchmark
Recent research focuses on developing comprehensive benchmarks for evaluating large language models (LLMs) and other machine learning models across diverse tasks, including economic games, financial question answering, graph analysis, and robotic manipulation. These benchmarks aim to standardize evaluation methodologies, address issues like fairness and robustness, and quantify uncertainty in model performance, using various architectures such as transformers and graph neural networks. The resulting standardized evaluations and datasets are crucial for advancing the field by facilitating more rigorous comparisons of models and identifying areas needing improvement, ultimately leading to more reliable and effective AI systems across numerous applications.
Papers
DIAS: A Dataset and Benchmark for Intracranial Artery Segmentation in DSA sequences
Wentao Liu, Tong Tian, Lemeng Wang, Weijin Xu, Lei Li, Haoyuan Li, Wenyi Zhao, Siyu Tian, Xipeng Pan, Huihua Yang, Feng Gao, Yiming Deng, Xin Yang, Ruisheng Su
Towards Mitigating more Challenging Spurious Correlations: A Benchmark & New Datasets
Siddharth Joshi, Yu Yang, Yihao Xue, Wenhan Yang, Baharan Mirzasoleiman
BMAD: Benchmarks for Medical Anomaly Detection
Jinan Bao, Hanshi Sun, Hanqiu Deng, Yinsheng He, Zhaoxiang Zhang, Xingyu Li
TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models
Yue Huang, Qihui Zhang, Philip S. Y, Lichao Sun
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian
Willy Fitra Hendria
FedMultimodal: A Benchmark For Multimodal Federated Learning
Tiantian Feng, Digbalay Bose, Tuo Zhang, Rajat Hebbar, Anil Ramakrishna, Rahul Gupta, Mi Zhang, Salman Avestimehr, Shrikanth Narayanan
Datasets and Benchmarks for Offline Safe Reinforcement Learning
Zuxin Liu, Zijian Guo, Haohong Lin, Yihang Yao, Jiacheng Zhu, Zhepeng Cen, Hanjiang Hu, Wenhao Yu, Tingnan Zhang, Jie Tan, Ding Zhao
GeneCIS: A Benchmark for General Conditional Image Similarity
Sagar Vaze, Nicolas Carion, Ishan Misra
VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON
Haoping Bai, Shancong Mou, Tatiana Likhomanenko, Ramazan Gokberk Cinbis, Oncel Tuzel, Ping Huang, Jiulong Shan, Jianjun Shi, Meng Cao