📄
Abstract - AIriskEval-edu: New Dataset for Risk Assessment in AI-mediated K-12 Educational Explanations
This work introduces AIriskEval-edu-db2, a new dataset designed to train and evaluate auditors based on LLMs for an explainable pedagogical risk assessment in instructional content for grades K-12. The dataset comprises 1,639 explanations from 170 curated ScienceQA questions, covering science, language arts, and social sciences. For each question, the dataset includes an explanation written by a human teacher alongside 11 explanations generated by LLM-simulated teacher profiles associated with distinct pedagogical risks. We propose a comprehensive risk rubric aligned with established educational standards that covers five complementary dimensions: factual precision, depth and completeness, focus and relevance, student-level appropriateness, and ideological bias. A key contribution is the addition of 785 explanations with structured explainability annotations, including risk localization and risk description. The annotations are produced through a semi-automatic process with expert teacher validation. Finally, we present validation experiments comparing state-of-the-art proprietary models with a lightweight local Llama 3.1 8B model in both the pedagogical risk detection and the explainability assessment. These experiments evaluate whether supervised fine-tuning on AIriskEval-edu-db2 enables a locally deployable model to approach or outperform stronger frontier models while preserving privacy in educational auditing and assessment tasks.
AI风险教育评估:用于AI辅助K-12教育解释中风险评估的新数据集 /
AIriskEval-edu: New Dataset for Risk Assessment in AI-mediated K-12 Educational Explanations
1️⃣ 一句话总结
该论文创建了一个包含1639个K-12教学解释的新数据集,用于训练和评估能自动检测教学内容中事实错误、不完整、不适当或带有偏见等风险的人工智能审计系统,并通过实验证明轻量级本地模型经微调后可在保护隐私的同时接近顶尖模型的表现。