菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-02
📄 Abstract - Leveraging BART to Assess CS1 C++ Programming Assignments using Rubric-based Criteria

This paper investigates rubric-aware, multitask fine-tuning of transformer models for automated grading of introductory C++ programming assignments, with the goal of producing grade predictions that better reflect instructor grading behavior than general-purpose LLMs. Using multi-semester CS1 data, student submissions are paired with numeric scores, letter-grade buckets, and assignment rubrics, then preprocessed into unified sequences for transformer input. A BART encoder-decoder with LoRA adaptation is trained to jointly predict numeric grades and grade buckets, augmented with a distribution-matching term to align predicted and empirical grade distributions, an evaluation dimension often overlooked in prior work. Experiments compare single-task and multitask training, hard one-hot versus fuzzy and boundary-based soft labels, and rubric versus no-rubric conditions, with additional T5 and pairwise-pretrained variants. Results show that multitask BART with boundary-based soft labels and rubric context achieves lower mean absolute error and stronger grade-distribution alignment than single-task, hard-label, or code-only baselines. Fully fine-tuned T5 further improves distributional fidelity, while pairwise pretraining reduces numeric error at the cost of minority-class sensitivity. Collectively, the findings suggest that calibration-aware, rubric-guided training produces more instructor-like grading behavior than accuracy-optimized alternatives.

顶级标签: llm natural language processing model training
详细标签: automated grading bart lora multitask fine-tuning rubric-aware 或 搜索:

利用BART模型基于评分标准评估CS1课程C++编程作业 / Leveraging BART to Assess CS1 C++ Programming Assignments using Rubric-based Criteria


1️⃣ 一句话总结

本文提出一种结合评分标准的BART模型微调方法,通过多任务学习同时预测分数和等级,并优化成绩分布匹配度,使自动评分更贴近教师的人工评分行为,显著优于传统单一任务或仅依赖代码的评分方法。

源自 arXiv: 2606.03814