Multiple Choice Question
Multiple-choice questions (MCQs) are widely used to evaluate large language models (LLMs), aiming to assess their knowledge, reasoning, and critical thinking abilities across diverse domains. Current research focuses on improving LLM performance on MCQs, exploring techniques like retrieval-augmented generation, fine-tuning with tailored demonstrations, and mitigating biases such as positional preferences and over-reliance on answer choices. This research is significant because robust and unbiased MCQ benchmarks are crucial for evaluating LLM capabilities and ensuring their reliable application in education, professional certification, and other high-stakes contexts.
Papers
Use neural networks to recognize students' handwritten letters and incorrect symbols
JiaJun Zhu, Zichuan Yang, Binjie Hong, Jiacheng Song, Jiwei Wang, Tianhao Chen, Shuilan Yang, Zixun Lan, Fei Ma
Performance of ChatGPT-3.5 and GPT-4 on the United States Medical Licensing Examination With and Without Distractions
Myriam Safrai, Amos Azaria