Multiple Choice
Multiple-choice question answering (MCQA) serves as a crucial benchmark for evaluating large language models (LLMs), assessing their knowledge, reasoning, and ability to follow instructions across diverse domains. Current research focuses on improving LLM performance on MCQA tasks by addressing limitations like format biases and developing more robust evaluation metrics, often employing techniques like parameter-efficient fine-tuning (e.g., LoRA) and attention mechanism analysis within transformer architectures. These advancements are significant because reliable MCQA benchmarks are essential for advancing LLM development and ensuring their responsible deployment in various applications, from education and healthcare to specialized fields like materials science and cybersecurity.
Papers
Evaluating the Symbol Binding Ability of Large Language Models for Multiple-Choice Questions in Vietnamese General Education
Duc-Vu Nguyen, Quoc-Nam Nguyen
Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting
Guande He, Peng Cui, Jianfei Chen, Wenbo Hu, Jun Zhu