面向教育应用的大语言模型提示评估 / LLM Prompt Evaluation for Educational Applications
1️⃣ 一句话总结
本研究提出了一种可推广的系统性方法,用于评估和优化教育应用中大语言模型的提示设计,并通过实验证明,一个结合了角色扮演和上下文管理、旨在支持元认知学习策略的提示模板在生成个性化、符合教学目标的输出方面表现最佳。
As large language models (LLMs) become increasingly common in educational applications, there is a growing need for evidence-based methods to design and evaluate LLM prompts that produce personalized and pedagogically aligned out-puts. This study presents a generalizable, systematic approach for evaluating prompts, demonstrated through an analysis of LLM-generated follow-up questions in a structured dialogue activity. Six prompt templates were designed and tested. The templates incorporated established prompt engineering patterns, with each prompt emphasizing distinct pedagogical strategies. The prompt templates were compared through a tournament-style evaluation framework that can be adapted for other educational applications. The tournament employed the Glicko2 rating system with eight judges evaluating question pairs across three dimensions: format, dialogue support, and appropriateness for learners. Data was sourced from 120 authentic user interactions across three distinct educational deployments. Results showed that a single prompt related to strategic reading out-performed other templates with win probabilities ranging from 81% to 100% in pairwise comparisons. This prompt combined persona and context manager pat-terns and was designed to support metacognitive learning strategies such as self-directed learning. The methodology showcases how educational technology re- searchers can systematically evaluate and improve prompt designs, moving beyond ad-hoc prompt engineering toward evidence-based prompt development for educational applications.
面向教育应用的大语言模型提示评估 / LLM Prompt Evaluation for Educational Applications
本研究提出了一种可推广的系统性方法,用于评估和优化教育应用中大语言模型的提示设计,并通过实验证明,一个结合了角色扮演和上下文管理、旨在支持元认知学习策略的提示模板在生成个性化、符合教学目标的输出方面表现最佳。
源自 arXiv: 2601.16134