Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
On the Evaluation of NLP-based Models for Software Engineering
Maliheh Izadi, Matin Nili Ahmadabadi
Effective data screening technique for crowdsourced speech intelligibility experiments: Evaluation with IRM-based speech enhancement
Ayako Yamamoto, Toshio Irino, Shoko Araki, Kenichi Arai, Atsunori Ogawa, Keisuke Kinoshita, Tomohiro Nakatani
Evaluation of YOLO Models with Sliced Inference for Small Object Detection
Muhammed Can Keles, Batuhan Salmanoglu, Mehmet Serdar Guzel, Baran Gursoy, Gazi Erkan Bostanci
Creating Realistic Ground Truth Data for the Evaluation of Calibration Methods for Plenoptic and Conventional Cameras
Tim Michels, Arne Petersen, Reinhard Koch
On the Evaluation of Answer-Agnostic Paragraph-level Multi-Question Generation
Jishnu Ray Chowdhury, Debanjan Mahata, Cornelia Caragea