Comprehensive Evaluation
Comprehensive evaluation in various scientific domains focuses on rigorously assessing the performance and limitations of models and algorithms, particularly in complex tasks like scientific discovery, medical image analysis, and recommendation systems. Current research emphasizes developing standardized benchmarks and multifaceted evaluation metrics, often incorporating multiple perspectives (e.g., quantitative metrics, human evaluation) to provide a holistic understanding of model capabilities. This rigorous approach is crucial for advancing model development, ensuring reproducibility, and ultimately improving the reliability and trustworthiness of AI-driven solutions across diverse fields.
Papers
ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation
Guangyu Wang, Guoxing Yang, Zongxin Du, Longjun Fan, Xiaohu Li
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond
Fangzhi Xu, Qika Lin, Jiawei Han, Tianzhe Zhao, Jun Liu, Erik Cambria
Extensive Evaluation of Transformer-based Architectures for Adverse Drug Events Extraction
Simone Scaboro, Beatrice Portellia, Emmanuele Chersoni, Enrico Santus, Giuseppe Serra
Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction
Xuan Lin, Lichang Dai, Yafang Zhou, Zu-Guo Yu, Wen Zhang, Jian-Yu Shi, Dong-Sheng Cao, Li Zeng, Haowen Chen, Bosheng Song, Philip S. Yu, Xiangxiang Zeng
EvEval: A Comprehensive Evaluation of Event Semantics for Large Language Models
Zhengwei Tao, Zhi Jin, Xiaoying Bai, Haiyan Zhao, Yanlin Feng, Jia Li, Wenpeng Hu
GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP
Md Tawkat Islam Khondaker, Abdul Waheed, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed
A Fair and In-Depth Evaluation of Existing End-to-End Entity Linking Systems
Hannah Bast, Matthias Hertel, Natalie Prange