ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

📄 Abstract - ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer, leading to the model overfitting to non-core expressions. Although our empirical analysis suggests that introducing multiple reference answers can mitigate this issue, the prohibitive data and computational costs necessitate a strategic shift: prioritizing the mitigation of single-reference overfitting over the costly pursuit of answer diversity. To achieve this, we reveal the intrinsic connection between token probability and semantic importance: high-probability tokens carry the core logical framework, while low-probability tokens are mostly replaceable expressions. Based on this insight, we propose ProFit, which selectively masks low-probability tokens to prevent surface-level overfitting. Extensive experiments confirm that ProFit consistently outperforms traditional SFT baselines on general reasoning and mathematical benchmarks.

ProFit：通过概率引导的令牌选择在监督微调中利用高价值信号 / ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

1️⃣ 一句话总结

这篇论文提出了一种名为ProFit的新方法，通过智能地屏蔽语言模型中那些低概率、可替换的词语来防止训练时的死记硬背，从而用更低的成本让大模型在推理和数学任务上表现得更好。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要