From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench

📄 Abstract - From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench

Large Language Models (LLMs) show significant potential in AI mathematical tutoring, yet current evaluations often rely on simplistic metrics or narrow pedagogical scenarios, failing to assess comprehensive, multi-turn teaching effectiveness. In this paper, we introduce KMP-Bench, a comprehensive K-8 Mathematical Pedagogical Benchmark designed to assess LLMs from two complementary perspectives. The first module, KMP-Dialogue, evaluates holistic pedagogical capabilities against six core principles (e.g., Challenge, Explanation, Feedback), leveraging a novel multi-turn dialogue dataset constructed by weaving together diverse pedagogical components. The second module, KMP-Skills, provides a granular assessment of foundational tutoring abilities, including multi-turn problem-solving, error detection and correction, and problem generation. Our evaluations on KMP-Bench reveal a key disparity: while leading LLMs excel at tasks with verifiable solutions, they struggle with the nuanced application of pedagogical principles. Additionally, we present KMP-Pile, a large-scale (150K) dialogue dataset. Models fine-tuned on KMP-Pile show substantial improvement on KMP-Bench, underscoring the value of pedagogically-rich training data for developing more effective AI math tutors.

从解题者到辅导者：使用KMP-Bench评估大语言模型的教学智能 / From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench

1️⃣ 一句话总结

这篇论文提出了一个名为KMP-Bench的综合性评估基准，专门用于测试大语言模型在K-8年级数学辅导中的教学能力，发现当前模型虽然擅长解题，但在遵循教学原则（如引导、解释、反馈）方面仍有不足，并证明使用高质量教学对话数据训练可以显著提升模型的辅导效果。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要