← 返回列表

arXiv 提交日期: 2026-03-25

📄 Abstract - Towards Reward Modeling for AI Tutors in Math Mistake Remediation

Evaluating the pedagogical quality of AI tutors remains challenging: standard NLG metrics do not determine whether responses identify mistakes, scaffold reasoning, or avoid revealing the answers. For the task of mistake remediation, we derive a hierarchy of pedagogical aspects from human pairwise preferences on MRBench, and synthesize minimally contrastive response pairs that differ along key aspects (e.g., mistake identification and location, targetedness, scaffolding, actionability, clarity, and coherence). We develop and release Bradley-Terry preference models trained on weighted-sum rankings that we automatically create from MRBench, synthetic pairs, and data combinations. Using only synthetic data, our best model reaches 0.69 pairwise accuracy on a human preference test, and combining weighted-sum data with targeted synthetic groups improves accuracy to 0.74, outperforming larger general-purpose reward models while using only a 0.5B-parameter backbone.

顶级标签: llm model evaluation natural language processing

迈向数学错误纠正中AI导师的奖励建模 / Towards Reward Modeling for AI Tutors in Math Mistake Remediation

1️⃣ 一句话总结

这篇论文提出了一种新方法来评估和提升AI数学导师的教学质量，通过分析人类偏好数据并合成对比样本，训练出能准确判断导师回复是否有效帮助学生发现和改正错误的奖励模型。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2603.24375

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要