Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
Evaluation of Country Dietary Habits Using Machine Learning Techniques in Relation to Deaths from COVID-19
María Teresa García-Ordás, Natalia Arias, Carmen Benavides, Oscar García-Olalla, José Alberto Benítez-Andrades
An evaluation of Deep Learning based stereo dense matching dataset shift from aerial images and a large scale stereo dataset
Teng Wu, Bruno Vallet, Marc Pierrot-Deseilligny, Ewelina Rupnik
Design and evaluation of a multi-finger skin-stretch tactile interface for hand rehabilitation robots
Alexandre L. Ratschat, Rubén Martín-Rodríguez, Yasemin Vardar, Gerard M. Ribbers, Laura Marchal-Crespo
Class-incremental Learning for Time Series: Benchmark and Evaluation
Zhongzheng Qiao, Quang Pham, Zhen Cao, Hoang H Le, P. N. Suganthan, Xudong Jiang, Ramasamy Savitha
Evaluation of ChatGPT's Smart Contract Auditing Capabilities Based on Chain of Thought
Yuying Du, Xueyan Tang
A Survey on Extractive Knowledge Graph Summarization: Applications, Approaches, Evaluation, and Future Directions
Xiaxia Wang, Gong Cheng
MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization
Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun
Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning
Kang Chen, Zheng Lian, Haiyang Sun, Bin Liu, Jianhua Tao