菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-13
📄 Abstract - GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

Existing reasoning data curation pipelines score whole samples, treating every intermediate step as equally valuable. In reality, steps within a trace contribute very unevenly, and selecting reasoning data well requires assessing them individually. We present GRACE, a gradient-aligned curation method that views each reasoning trace as a sequence of optimization events and scores every step by two complementary signals: its alignment with the answer-oriented gradient direction, and its consistency with the preceding reasoning trajectory. Step-level scores are aggregated into a sample-level value for subset selection, using only the model's internal optimization signals and no external reward models or step annotations. To make this scalable, GRACE introduces a representation-level gradient proxy that estimates step-level alignment from token-level upstream signals in a single forward pass. Post-training Qwen3-VL-2B-Instruct on MMathCoT-1M, GRACE reaches 108.8% of the full-data performance with 20% of the data and retains 100.2% with only 5%, with subsets that transfer effectively across model backbones.

顶级标签: llm model training
详细标签: data curation reasoning gradient alignment subset selection efficient training 或 搜索:

GRACE:基于梯度对齐的推理数据筛选方法,用于高效后训练 / GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training


1️⃣ 一句话总结

GRACE提出了一种新的推理数据筛选方法,通过评估每个推理步骤对最终答案的贡献(梯度方向对齐)和与前面步骤的一致性,从而只保留最有价值的数据,仅用5%的数据就能达到全量数据训练的效果,大幅提升了后训练的效率。

源自 arXiv: 2605.13130