Multiple Choice
Multiple-choice question answering (MCQA) serves as a crucial benchmark for evaluating large language models (LLMs), assessing their knowledge, reasoning, and ability to follow instructions across diverse domains. Current research focuses on improving LLM performance on MCQA tasks by addressing limitations like format biases and developing more robust evaluation metrics, often employing techniques like parameter-efficient fine-tuning (e.g., LoRA) and attention mechanism analysis within transformer architectures. These advancements are significant because reliable MCQA benchmarks are essential for advancing LLM development and ensuring their responsible deployment in various applications, from education and healthcare to specialized fields like materials science and cybersecurity.
Papers
MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework
Zonghai Yao, Zihao Zhang, Chaolong Tang, Xingyu Bian, Youxia Zhao, Zhichao Yang, Junda Wang, Huixue Zhou, Won Seok Jang, Feiyun Ouyang, Hong Yu
DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models
Yuxuan Zhang, Ruizhe Li