Benchmark Platform
Benchmark platforms in various scientific domains aim to provide standardized evaluations of models and algorithms, enabling fair comparisons and driving research progress. Current research focuses on developing comprehensive benchmarks across diverse areas, including natural language processing, computer vision, robotics, and healthcare, often incorporating novel model architectures like large language models and deep learning frameworks. These platforms are crucial for advancing the field by facilitating reproducible research, identifying limitations of existing methods, and ultimately leading to more robust and reliable systems with real-world applications. The resulting insights inform the development of improved algorithms and contribute to a more rigorous and transparent scientific process.
Papers
Benchmarking and Enhancing Disentanglement in Concept-Residual Models
Renos Zabounidis, Ini Oguntola, Konghao Zhao, Joseph Campbell, Simon Stepputtis, Katia Sycara
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval
Matthew Gwilliam, Michael Cogswell, Meng Ye, Karan Sikka, Abhinav Shrivastava, Ajay Divakaran
Benchmarking and Improving Generator-Validator Consistency of Language Models
Xiang Lisa Li, Vaishnavi Shrivastava, Siyan Li, Tatsunori Hashimoto, Percy Liang
CausalTime: Realistically Generated Time-series for Benchmarking of Causal Discovery
Yuxiao Cheng, Ziqian Wang, Tingxiong Xiao, Qin Zhong, Jinli Suo, Kunlun He