Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
Evaluation of Pre-Trained CNN Models for Geographic Fake Image Detection
Sid Ahmed Fezza, Mohammed Yasser Ouis, Bachir Kaddar, Wassim Hamidouche, Abdenour Hadid
Construction and Evaluation of a Self-Attention Model for Semantic Understanding of Sentence-Final Particles
Shuhei Mandokoro, Natsuki Oka, Akane Matsushima, Chie Fukada, Yuko Yoshimura, Koji Kawahara, Kazuaki Tanaka
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Leandro von Werra, Lewis Tunstall, Abhishek Thakur, Alexandra Sasha Luccioni, Tristan Thrush, Aleksandra Piktus, Felix Marty, Nazneen Rajani, Victor Mustar, Helen Ngo, Omar Sanseviero, Mario Šaško, Albert Villanova, Quentin Lhoest, Julien Chaumond, Margaret Mitchell, Alexander M. Rush, Thomas Wolf, Douwe Kiela
Evaluation of importance estimators in deep learning classifiers for Computed Tomography
Lennart Brocki, Wistan Marchadour, Jonas Maison, Bogdan Badic, Panagiotis Papadimitroulas, Mathieu Hatt, Franck Vermet, Neo Christopher Chung
Evaluation of taxonomic and neural embedding methods for calculating semantic similarity
Dongqiang Yang, Yanqin Yin
Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals
Piotr Mirowski, Kory W. Mathewson, Jaylen Pittman, Richard Evans
Evaluation of physics constrained data-driven methods for turbulence model uncertainty quantification
Marcel Matha, Karsten Kucharczyk, Christian Morsbach
Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe
Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Jia Zeng, Zhiqi Li, Jiazhi Yang, Hanming Deng, Hao Tian, Enze Xie, Jiangwei Xie, Li Chen, Tianyu Li, Yang Li, Yulu Gao, Xiaosong Jia, Si Liu, Jianping Shi, Dahua Lin, Yu Qiao
An Evaluation of Low Overhead Time Series Preprocessing Techniques for Downstream Machine Learning
Matthew L. Weiss, Joseph McDonald, David Bestor, Charles Yee, Daniel Edelman, Michael Jones, Andrew Prout, Andrew Bowne, Lindsey McEvoy, Vijay Gadepally, Siddharth Samsi