我们何时能信任早期预警?从学习管理系统交互日志中进行无数据泄露的早期结果预测 / When Can We Trust Early Warnings? Leakage-Excluded Early Outcome Prediction from LMS Interaction Logs
1️⃣ 一句话总结
本文发现,基于学习管理系统日志构建的早期预警模型,其看似准确的预测常因使用了未来信息而虚高,为此提出LEAP协议来严格排除此类数据泄露,并通过实验展示了不同时间点下模型真实性能的变化规律。
Early-warning models built from Learning Management System (LMS) logs aim to predict end-of-course outcomes early enough to enable timely learner support. However, reported "early" performance is often inflated by temporal leakage. This occurs when the pipeline uses information that would not yet be available at the time of prediction. We formalize cutoff-based early outcome prediction under a temporal availability constraint and introduce LEAP (Leakage-Excluded Early-Availability Protocol), which enforces cutoff-first truncation prior to joins and aggregation and audits feature provenance to prevent post-cutoff evidence from entering the benchmark. We instantiate LEAP on the public Open University Learning Analytics Dataset (OULAD) as a multi-step protocol for leakage-controlled evaluation across weekly cutoffs. Using several standard learning methods, we evaluate performance using ROC-AUC, PR-AUC, Brier score, and F1@0.5. Results show improving performance as the observation window expands, with a marked gain around week~3; Random Forest performs best at the earliest cutoffs, while Gradient Boosting dominates thereafter. Leakage ablations further show that temporal violations, especially through assessment information, can inflate apparent "early" performance.
我们何时能信任早期预警?从学习管理系统交互日志中进行无数据泄露的早期结果预测 / When Can We Trust Early Warnings? Leakage-Excluded Early Outcome Prediction from LMS Interaction Logs
本文发现,基于学习管理系统日志构建的早期预警模型,其看似准确的预测常因使用了未来信息而虚高,为此提出LEAP协议来严格排除此类数据泄露,并通过实验展示了不同时间点下模型真实性能的变化规律。
源自 arXiv: 2605.25794