Multiple Choice Question
Multiple-choice questions (MCQs) are widely used to evaluate large language models (LLMs), aiming to assess their knowledge, reasoning, and critical thinking abilities across diverse domains. Current research focuses on improving LLM performance on MCQs, exploring techniques like retrieval-augmented generation, fine-tuning with tailored demonstrations, and mitigating biases such as positional preferences and over-reliance on answer choices. This research is significant because robust and unbiased MCQ benchmarks are crucial for evaluating LLM capabilities and ensuring their reliable application in education, professional certification, and other high-stakes contexts.
Papers
(A)I Am Not a Lawyer, But...: Engaging Legal Experts towards Responsible LLM Policies for Legal Advice
Inyoung Cheong, King Xia, K. J. Kevin Feng, Quan Ze Chen, Amy X. Zhang
Distractor Generation for Multiple-Choice Questions: A Survey of Methods, Datasets, and Evaluation
Elaf Alhazmi, Quan Z. Sheng, Wei Emma Zhang, Munazza Zaib, Ahoud Alhazmi
LLMs May Perform MCQA by Selecting the Least Incorrect Option
Haochun Wang, Sendong Zhao, Zewen Qiang, Nuwa Xi, Bing Qin, Ting Liu
Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine
Qiao Jin, Fangyuan Chen, Yiliang Zhou, Ziyang Xu, Justin M. Cheung, Robert Chen, Ronald M. Summers, Justin F. Rousseau, Peiyun Ni, Marc J Landsman, Sally L. Baxter, Subhi J. Al'Aref, Yijia Li, Alex Chen, Josef A. Brejt, Michael F. Chiang, Yifan Peng, Zhiyong Lu
Improved Performances and Motivation in Intelligent Tutoring Systems: Combining Machine Learning and Learner Choice
Benjamin Clément, Hélène Sauzéon, Didier Roy, Pierre-Yves Oudeyer