Performance Score

Performance scores, central to evaluating machine learning models and other systems, are undergoing significant refinement. Research focuses on developing more nuanced scoring methods that go beyond simple accuracy metrics, incorporating aspects like attention weights, retrieval-augmented generation, and even multi-modal feedback. These advancements aim to improve model interpretability, address biases, and provide more reliable assessments of system capabilities across diverse applications, from automated essay grading to generative AI evaluation. The ultimate goal is to create more robust and trustworthy evaluation frameworks that better reflect real-world performance.

Papers

October 25, 2023

SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process
Zichong Li, Yanbo Xu, Simiao Zuo, Haoming Jiang, Chao Zhang, Tuo Zhao, Hongyuan Zha
Uncertainty Quantification Performance Score Hawkes Process Prediction Uncertainty Event Prediction Event Sequence Data Transformer Hawkes Process

October 19, 2023

Testing the Consistency of Performance Scores Reported for Binary Classification Problems
Attila Fazekas, György Kovács
Machine Learning Strong Consistency Binary Classification Cross Validation Performance Score Consistency Test

October 4, 2023

LC-Score: Reference-less estimation of Text Comprehension Difficulty
Paul Tardy, Charlotte Roze, Paul Poupet
Performance Score Knowledge Comprehension Capability Reference Free

September 29, 2023

SCoRe: Submodular Combinatorial Representation Learning
Anay Majee, Suraj Kothawade, Krishnateja Killamsetty, Rishabh Iyer
Representation Learning Loss Function Contrastive Loss Submodular Maximization Performance Score

September 28, 2023

Predicting performance difficulty from piano sheet music images
Pedro Ramoneda, Jose J. Valero-Mas, Dasaem Jeong, Xavier Serra
Music Information Retrieval Performance Score Musical Score Performance Issue

September 8, 2023

Score-PA: Score-based 3D Part Assembly
Junfeng Cheng, Mingdong Wu, Ruiyuan Zhang, Guanqi Zhan, Chao Wu, Hao Dong
Robotics Domain Assembly Task Performance Score 3D Computer Vision Part Assembly 3D Part Assembly

September 5, 2023

Symbolic Music Representations for Classification Tasks: A Systematic Evaluation
Huan Zhang, Emmanouil Karystinaios, Simon Dixon, Gerhard Widmer, Carlos Eduardo Cancino-Chacón
Graph Representation Classification Task Music Information Retrieval Performance Score Symbolic Music Symbolic Expression

July 26, 2023

AI4GCC-Team -- Below Sea Level: Score and Real World Relevance
Phillip Wozny, Bram Renting, Robert Loftin, Claudia Wieners, Erman Acar
Real World Performance Score Carbon Market Time Varying Pricing Tariff Climate Economic

July 11, 2023

Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores
Shukai Liu, Chenming Wu, Ying Li, Liangjun Zhang
Adaptive Learning Performance Score Sparse Reward Environment Pairwise Preference Interactive Reinforcement Learning User Feedback

July 7, 2023

Simulation-free Schr\"odinger bridges via score and flow matching
Alexander Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yanlei Zhang, Guillaume Huguet, Guy Wolf, Yoshua Bengio
Generative Modeling Flow Matching Performance Score Stochastic Dynamical System Schr\"odinger Bridge

June 15, 2023

From Database Repairs to Causality in Databases and Beyond
Leopoldo Bertossi
Query Information Causal Pattern Counterfactual Reasoning Performance Score New Database Score Based Explanation

June 8, 2023

Are fairness metric scores enough to assess discrimination biases in machine learning?
Fanny Jourdan, Laurent Risser, Jean-Michel Loubes, Nicholas Asher
Machine Learning Procedural Fairness Gender Bias Performance Score

May 31, 2023

Adaptive Conformal Regression with Jackknife+ Rescaled Scores
Nicolas Deutschmann, Mattia Rigotti, Maria Rodriguez Martinez
Performance Score Conformal Regression Conformal Prediction Interval Error Distribution Conformal Score

May 22, 2023

"According to ...": Prompting Language Models Improves Quoting from Pre-Training Data
Orion Weller, Marc Marone, Nathaniel Weir, Dawn Lawrie, Daniel Khashabi, Benjamin Van Durme
Pre Training Performance Score Model Generated Quotation Attribution

May 7, 2023

Score: A Rule Engine for the Scone Knowledge Base System
Jeffrey Chen, Scott E. Fahlman
Knowledge Base Performance Score Rule Engine

April 9, 2023

Can ChatGPT and Bard Generate Aligned Assessment Items? A Reliability Analysis against Human Performance
Abdolvahab Khademi
ChatGPT Generated Conversation Performance Score AI Chatbots Human Performance Reliability Analysis Google Bard OpenAI ChatGPT

February 15, 2023

Measuring the Instability of Fine-Tuning
Yupei Du, Dong Nguyen
Fine Tuning Pre Trained Language Model Core Stability Performance Score

January 21, 2023

Poor Man's Quality Estimation: Predicting Reference-Based MT Metrics Without the Reference
Vilém Zouhar, Shehzaad Dhuliawala, Wangchunshu Zhou, Nico Daheim, Tom Kocmi, Yuchen Eleanor Jiang, Mrinmaya Sachan
Language Model Translation Quality Performance Score Higher Quality Reference Quality Estimation MT Evaluation Quality Estimation Shared Task

December 30, 2022

MAUVE Scores for Generative Models: Theory and Practice
Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha Swayamdipta, Rowan Zellers, Sewoong Oh, Yejin Choi, Zaid Harchaoui
Generative Model Generative Modeling Theoretical Understanding Generative Question Practice Mode Performance Score Modern Language Model Photorealistic Image Image Modality

December 28, 2022

All's well that FID's well? Result quality and metric scores in GAN models for lip-sychronization tasks
Carina Geldhauser, Johan Liljegren, Pontus Nordqvist
Pytorch Model Quality Issue Performance Score Lip Synchronization