Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
Autonomous Needle Navigation in Retinal Microsurgery: Evaluation in ex vivo Porcine Eyes
Peiyao Zhang, Ji Woong Kim, Peter Gehlbach, Iulian Iordachita, Marin Kobilarov
Gene Teams are on the Field: Evaluation of Variants in Gene-Networks Using High Dimensional Modelling
Suha Tuna, Cagri Gulec, Emrah Yucesan, Ayse Cirakoglu, Yelda Tarkan Arguden
RGB-D-Based Categorical Object Pose and Shape Estimation: Methods, Datasets, and Evaluation
Leonard Bruns, Patric Jensfelt
Evaluation of the potential of Near Infrared Hyperspectral Imaging for monitoring the invasive brown marmorated stink bug
Veronica Ferrari, Rosalba Calvini, Bas Boom, Camilla Menozzi, Aravind Krishnaswamy Rangarajan, Lara Maistrello, Peter Offermans, Alessandro Ulrici
Decision-Focused Evaluation: Analyzing Performance of Deployed Restless Multi-Arm Bandits
Paritosh Verma, Shresth Verma, Aditya Mate, Aparna Taneja, Milind Tambe
Evaluation of Induced Expert Knowledge in Causal Structure Learning by NOTEARS
Jawad Chowdhury, Rezaur Rashid, Gabriel Terejanu
A comprehensive review of automatic text summarization techniques: method, data, evaluation and coding
Daniel O. Cajueiro, Arthur G. Nery, Igor Tavares, Maísa K. De Melo, Silvia A. dos Reis, Li Weigang, Victor R. R. Celestino
Crowd Score: A Method for the Evaluation of Jokes using Large Language Model AI Voters as Judges
Fabricio Goes, Zisen Zhou, Piotr Sawicki, Marek Grzes, Daniel G. Brown
Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation Models
Lingjun Zhao, Khanh Nguyen, Hal Daumé
Comparison and Evaluation of Methods for a Predict+Optimize Problem in Renewable Energy
Christoph Bergmeir, Frits de Nijs, Abishek Sriramulu, Mahdi Abolghasemi, Richard Bean, John Betts, Quang Bui, Nam Trong Dinh, Nils Einecke, Rasul Esmaeilbeigi, Scott Ferraro, Priya Galketiya, Evgenii Genov, Robert Glasgow, Rakshitha Godahewa, Yanfei Kang, Steffen Limmer, Luis Magdalena, Pablo Montero-Manso, Daniel Peralta, Yogesh Pipada Sunil Kumar, Alejandro Rosales-Pérez, Julian Ruddick, Akylas Stratigakos, Peter Stuckey, Guido Tack, Isaac Triguero, Rui Yuan