Medical Benchmark

Medical benchmarks are standardized datasets and evaluation protocols used to assess the performance of artificial intelligence models in healthcare applications, primarily focusing on improving diagnostic accuracy and clinical decision-making. Current research emphasizes the development of comprehensive benchmarks encompassing diverse medical modalities (imaging, text, physiological signals), evaluating various model architectures including large language models (LLMs) and vision-language models (LVLMs), and exploring different fine-tuning strategies. These benchmarks are crucial for advancing the field by enabling objective comparisons of AI models, identifying their limitations, and ultimately facilitating the development of more reliable and trustworthy AI tools for clinical use.

Papers

August 17, 2023

CMB: A Comprehensive Medical Benchmark in Chinese
Xidong Wang, Guiming Hardy Chen, Dingjie Song, Zhiyi Zhang, Zhihong Chen, Qingying Xiao, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, Haizhou Li
Large Language Model Chinese Character Medical Benchmark Cosmic Microwave Background

July 5, 2023

EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models
Michael Wornow, Rahul Thapa, Ethan Steinberg, Jason A. Fries, Nigam H. Shah
Foundation Model Global Evaluation Medical Benchmark EHR Datasets

June 20, 2023

BMAD: Benchmarks for Medical Anomaly Detection
Jinan Bao, Hanshi Sun, Hanqiu Deng, Yinsheng He, Zhaoxiang Zhang, Xingyu Li
Anomaly Detection New Benchmark Medical Image Evaluation Benchmark Medical Benchmark Medical Anomaly Detection

April 27, 2023

Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification
Thanh-Tung Nguyen, Viktor Schlegel, Abhinav Kashyap, Stefan Winkler, Shao-Syuan Huang, Jie-Jyun Liu, Chih-Jen Lin
New Benchmark Medical Information Mart for Intensive Medical Coding Medical Benchmark ICD Code EHR Datasets Extreme Multilabel Classification

March 10, 2023

Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings
Joel Shor, Ruyue Agnes Bi, Subhashini Venugopalan, Steven Ibara, Roman Goldenberg, Ehud Rivlin
Automatic Speech Recognition BERT Based Speech Recognition Performance Clinical Setting Medical Benchmark Transcription Error

February 28, 2022

Voxelmorph++ Going beyond the cranial vault with keypoint supervision and multi-channel instance optimisation
Mattias P. Heinrich, Lasse Hansen
Medical Benchmark Lung CT Cranial Defect Instance Optimal Brain MRI Registration Keypoint Tracking Abdominal CT Registration

January 3, 2022

RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark
Zhuo Deng, Yuanhao Cai, Lu Chen, Zheng Gong, Qiqi Bao, Xue Yao, Dong Fang, Shaochong Zhang, Lan Ma
Fundus Image Image Analysis Medical Benchmark Transformer Based Generative Adversarial

December 1, 2021

Personalized Federated Learning with Adaptive Batchnorm for Healthcare
Wang Lu, Jindong Wang, Yiqiang Chen, Xin Qin, Renjun Xu, Dimitrios Dimitriadis, Tao Qin
Healthcare System Personalized Federated Learning Batch Normalization Layer Medical Benchmark Batchnorm Minus Implementation