New Benchmark
Recent research focuses on developing comprehensive benchmarks for evaluating large language models (LLMs) and other machine learning models across diverse tasks, including economic games, financial question answering, graph analysis, and robotic manipulation. These benchmarks aim to standardize evaluation methodologies, address issues like fairness and robustness, and quantify uncertainty in model performance, using various architectures such as transformers and graph neural networks. The resulting standardized evaluations and datasets are crucial for advancing the field by facilitating more rigorous comparisons of models and identifying areas needing improvement, ultimately leading to more reliable and effective AI systems across numerous applications.
Papers
Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields
Yatao Li, Wanling Gao, Lei Wang, Lixin Sun, Zun Wang, Jianfeng Zhan
Large-Scale Learning on Overlapped Speech Detection: New Benchmark and New General System
Zhaohui Yin, Jingguang Tian, Xinhui Hu, Xinkang Xu, Yang Xiang
Benchmarking Performance of Deep Learning Model for Material Segmentation on Two HPC Systems
Warren R. Williams, S. Ross Glandon, Luke L. Morris, Jing-Ru C. Cheng
BubbleML: A Multi-Physics Dataset and Benchmarks for Machine Learning
Sheikh Md Shakeel Hassan, Arthur Feeney, Akash Dhruv, Jihoon Kim, Youngjoon Suh, Jaiyoung Ryu, Yoonjin Won, Aparna Chandramowlishwaran
Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains
Martin Lebourdais, Théo Mariotte, Marie Tahon, Anthony Larcher, Antoine Laurent, Silvio Montresor, Sylvain Meignier, Jean-Hugh Thomas
COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts
Xiaofeng Mao, Yuefeng Chen, Yao Zhu, Da Chen, Hang Su, Rong Zhang, Hui Xue
Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model
Peng Wu, Jing Liu, Xiangteng He, Yuxin Peng, Peng Wang, Yanning Zhang