大型语言模型能否模拟学生的错误推理?一项关于干扰项生成的研究 / Can LLMs Model Incorrect Student Reasoning? A Case Study on Distractor Generation
1️⃣ 一句话总结
这项研究发现,大型语言模型在生成选择题干扰项时,其推理过程与教育科学的最佳实践高度一致,即先得出正确答案,再模拟多种可能的错误概念,最后筛选出合理的干扰项,而提供正确答案作为提示能显著提升其生成质量。
Modeling plausible student misconceptions is critical for AI in education. In this work, we examine how large language models (LLMs) reason about misconceptions when generating multiple-choice distractors, a task that requires modeling incorrect yet plausible answers by coordinating solution knowledge, simulating student misconceptions, and evaluating plausibility. We introduce a taxonomy for analyzing the strategies used by state-of-the-art LLMs, examining their reasoning procedures and comparing them to established best practices in the learning sciences. Our structured analysis reveals a surprising alignment between their processes and best practices: the models typically solve the problem correctly first, then articulate and simulate multiple potential misconceptions, and finally select a set of distractors. An analysis of failure modes reveals that errors arise primarily from failures in recovering the correct solution and selecting among response candidates, rather than simulating errors or structuring the process. Consistent with these results, we find that providing the correct solution in the prompt improves alignment with human-authored distractors by 8%, highlighting the critical role of anchoring to the correct solution when generating plausible incorrect student reasoning. Overall, our analysis offers a structured and interpretable lens into LLMs' ability to model incorrect student reasoning and produce high-quality distractors.
大型语言模型能否模拟学生的错误推理?一项关于干扰项生成的研究 / Can LLMs Model Incorrect Student Reasoning? A Case Study on Distractor Generation
这项研究发现,大型语言模型在生成选择题干扰项时,其推理过程与教育科学的最佳实践高度一致,即先得出正确答案,再模拟多种可能的错误概念,最后筛选出合理的干扰项,而提供正确答案作为提示能显著提升其生成质量。
源自 arXiv: 2603.15547