Global Evaluation

Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.

723papers

Papers - Page 27

January 3, 2024

Evaluation of automated driving system safety metrics with logged vehicle trajectory data
Safety Metric Global Evaluation Vehicle Trajectory Data

December 31, 2023

Sounding Out Reconstruction Error-Based Evaluation of Generative Models of Expressive Performance
Global Evaluation Generative Model Expressive Performance Full State Reconstruction Music Information Retrieval Reference Based Metric High Performance Piano Performance

December 18, 2023

Development and Evaluation of Ensemble Learning-based Environmental Methane Detection and Intensity Prediction Models
Machine Learning Model Global Evaluation Methane Flux Intensity Distribution Methane Source Ensemble Learning Development Activity

December 14, 2023

Design, construction and evaluation of emotional multimodal pathological speech database
Emotional Expression Emotion Recognition Construction Industry Hypokinetic Dysarthria Global Evaluation Product Design Emotional Speech

December 11, 2023

Evaluation of Large Language Models for Decision Making in Autonomous Driving
Global Evaluation Autonomous Driving Vehicle Plan Decision Making

December 10, 2023

Large Language Models on Lexical Semantic Change Detection: An Evaluation
Language Model Global Evaluation Large Language Model Lexical Semantic Change Detection

December 8, 2023

Conformal Prediction in Multi-User Settings: An Evaluation
Conformal Prediction Global Evaluation Conformal Score Performance Metric Multi User

December 6, 2023

December 4, 2023

Reconsideration on evaluation of machine learning models in continuous monitoring using wearables
Machine Learning Wearable Device Machine Learning Model Health Monitoring High Diagram to Diagram Variability Global Evaluation

December 3, 2023

Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings
Offline Reinforcement Learning Active Feature Acquisition Feature Acquisition Global Evaluation Set Based Time Dependent Feature

December 2, 2023

November 30, 2023

CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation
Global Evaluation Large Language Model Lower Critique Accuracy Natural Language Processing Model Criticism

Global Evaluation

Papers - Page 27

Evaluation of automated driving system safety metrics with logged vehicle trajectory data

Sounding Out Reconstruction Error-Based Evaluation of Generative Models of Expressive Performance

Evaluation is all you need. Prompting Generative Large Language Models for Annotation Tasks in the Social Sciences. A Primer using Open Models

An \ell¹-Plug-and-Play Approach for MPI Using a Zero Shot Denoiser with Evaluation on the 3D Open MPI Dataset

How to Evaluate Coreference in Literary Texts?

Development and Evaluation of Ensemble Learning-based Environmental Methane Detection and Intensity Prediction Models

Design, construction and evaluation of emotional multimodal pathological speech database

How much can change in a year? Revisiting Evaluation in Multi-Agent Reinforcement Learning

Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies

PromptBench: A Unified Library for Evaluation of Large Language Models

Evaluation of Large Language Models for Decision Making in Autonomous Driving

Large Language Models on Lexical Semantic Change Detection: An Evaluation

Conformal Prediction in Multi-User Settings: An Evaluation

Evaluation of Infrastructure-based Warning System on Driving Behaviors-A Roundabout Study

Evaluation of Active Feature Acquisition Methods for Static Feature Settings

Reconsideration on evaluation of machine learning models in continuous monitoring using wearables

Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings

Beyond Accuracy: Statistical Measures and Benchmark for Evaluation of Representation from Self-Supervised Learning

Kattis vs. ChatGPT: Assessment and Evaluation of Programming Tasks in the Age of Artificial Intelligence

CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation

Global Evaluation

Papers - Page 27

Evaluation of automated driving system safety metrics with logged vehicle trajectory data

Sounding Out Reconstruction Error-Based Evaluation of Generative Models of Expressive Performance

Evaluation is all you need. Prompting Generative Large Language Models for Annotation Tasks in the Social Sciences. A Primer using Open Models

An \ell1-Plug-and-Play Approach for MPI Using a Zero Shot Denoiser with Evaluation on the 3D Open MPI Dataset

How to Evaluate Coreference in Literary Texts?

Development and Evaluation of Ensemble Learning-based Environmental Methane Detection and Intensity Prediction Models

Design, construction and evaluation of emotional multimodal pathological speech database

How much can change in a year? Revisiting Evaluation in Multi-Agent Reinforcement Learning

Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies

PromptBench: A Unified Library for Evaluation of Large Language Models

Evaluation of Large Language Models for Decision Making in Autonomous Driving

Large Language Models on Lexical Semantic Change Detection: An Evaluation

Conformal Prediction in Multi-User Settings: An Evaluation

Evaluation of Infrastructure-based Warning System on Driving Behaviors-A Roundabout Study

Evaluation of Active Feature Acquisition Methods for Static Feature Settings

Reconsideration on evaluation of machine learning models in continuous monitoring using wearables

Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings

Beyond Accuracy: Statistical Measures and Benchmark for Evaluation of Representation from Self-Supervised Learning

Kattis vs. ChatGPT: Assessment and Evaluation of Programming Tasks in the Age of Artificial Intelligence

CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation

An \ell¹-Plug-and-Play Approach for MPI Using a Zero Shot Denoiser with Evaluation on the 3D Open MPI Dataset