Global Evaluation

Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.

723papers

Papers - Page 19

May 24, 2024

Detection and Positive Reconstruction of Cognitive Distortion sentences: Mandarin Dataset and Evaluation
Reconstruction Framework Mandarin Speech Global Evaluation Data Detection Cognitive Distortion

May 23, 2024

Evaluation of the Programming Skills of Large Language Models
Code Generation Global Evaluation Code Quality Chatbot Response

May 19, 2024

Review of deep learning models for crypto price prediction: implementation and evaluation
Narrative Review Cryptocurrency Price Deep Learning Model Novel Deep Learning Model Price Prediction Global Evaluation Practical Implementation

May 16, 2024

Monitizer: Automating Design and Evaluation of Neural Network Monitors
Neural Network Input Embeddings Automated Design Global Evaluation Formal Verification

May 8, 2024

May 7, 2024

May 6, 2024

Evaluation of Drivers' Interaction Ability at Social Scenarios: A Process-Based Framework
Rich Interaction Feature Interaction Social Scenario Human Driving Behavior Global Evaluation Driving Datasets Driver State Process Model

May 5, 2024

Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English
Sentiment Analysis Multilingual Dataset Cross Lingual English Dataset Machine Translation Global Evaluation Sentiment Analysis Task Sentiment Data

May 2, 2024

April 30, 2024

Towards Scenario- and Capability-Driven Dataset Development and Evaluation: An Approach in the Context of Mapless Automated Driving
Context Information Mapless Navigation Lane Detection New Datasets Global Evaluation Driving Maneuver Dataset Development Multi Scenario Environment Perception

Global Evaluation

Papers - Page 19

Detection and Positive Reconstruction of Cognitive Distortion sentences: Mandarin Dataset and Evaluation

Evaluation of the Programming Skills of Large Language Models

Review of deep learning models for crypto price prediction: implementation and evaluation

Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges

Enhancing the analysis of murine neonatal ultrasonic vocalizations: Development, evaluation, and application of different mathematical models

Guidelines for evaluation of complex multi agent test scenarios

Monitizer: Automating Design and Evaluation of Neural Network Monitors

Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques

Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep Learning for Data-Scarce Classification Applications

Evaluation of Retrieval-Augmented Generation: A Survey

Evaluating Students' Open-ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large

Sparse-PGD: A Unified Framework for Sparse Adversarial Perturbations Generation

Towards Geographic Inclusion in the Evaluation of Text-to-Image Models

Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation

A General Model for Detecting Learner Engagement: Implementation and Evaluation

Evaluation of Drivers' Interaction Ability at Social Scenarios: A Process-Based Framework

Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English

Evaluation of Video-Based rPPG in Challenging Environments: Artifact Mitigation and Network Resilience

On the Evaluation of Machine-Generated Reports

Towards Scenario- and Capability-Driven Dataset Development and Evaluation: An Approach in the Context of Mapless Automated Driving