Evaluation Metric
Evaluation metrics are crucial for assessing the performance of machine learning models, particularly in complex tasks like text and image generation, translation, and question answering. Current research emphasizes developing more nuanced and interpretable metrics that go beyond simple correlation with human judgments, focusing on aspects like multi-faceted assessment, robustness to biases, and alignment with expert evaluations. These improvements are vital for ensuring reliable model comparisons, facilitating the development of more effective algorithms, and ultimately leading to more trustworthy and impactful AI applications.
Papers
Dendrogram distance: an evaluation metric for generative networks using hierarchical clustering
Gustavo Sutter Carvalho, Moacir Antonelli Ponti
Radiology-Aware Model-Based Evaluation Metric for Report Generation
Amos Calamida, Farhad Nooralahzadeh, Morteza Rohanian, Koji Fujimoto, Mizuho Nishio, Michael Krauthammer
The DURel Annotation Tool: Human and Computational Measurement of Semantic Proximity, Sense Clusters and Semantic Change
Dominik Schlechtweg, Shafqat Mumtaz Virk, Pauline Sander, Emma Sköldberg, Lukas Theuer Linke, Tuo Zhang, Nina Tahmasebi, Jonas Kuhn, Sabine Schulte im Walde
Evaluation Metrics of Language Generation Models for Synthetic Traffic Generation Tasks
Simone Filice, Jason Ingyu Choi, Giuseppe Castellucci, Eugene Agichtein, Oleg Rokhlenko
Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion
Hao Zhou, Tiancheng Shen, Xu Yang, Hai Huang, Xiangtai Li, Lu Qi, Ming-Hsuan Yang
QualEval: Qualitative Evaluation for Model Improvement
Vishvak Murahari, Ameet Deshpande, Peter Clark, Tanmay Rajpurohit, Ashish Sabharwal, Karthik Narasimhan, Ashwin Kalyan