菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-08
📄 Abstract - On the Step Length Confounding in LLM Reasoning Data Selection

Large reasoning models have recently demonstrated strong performance on complex tasks that require long chain-of-thought reasoning, through supervised fine-tuning on large-scale and high-quality datasets. To construct such datasets, existing pipelines generate long reasoning data from more capable Large Language Models (LLMs) and apply manually heuristic or naturalness-based selection methods to filter high-quality samples. Despite the proven effectiveness of naturalness-based data selection, which ranks data by the average log probability assigned by LLMs, our analysis shows that, when applied to LLM reasoning datasets, it systematically prefers samples with longer reasoning steps (i.e., more tokens per step) rather than higher-quality ones, a phenomenon we term step length confounding. Through quantitative analysis, we attribute this phenomenon to low-probability first tokens in reasoning steps; longer steps dilute their influence, thereby inflating the average log probabilities. To address this issue, we propose two variant methods: ASLEC-DROP, which drops first-token probabilities when computing average log probability, and ASLEC-CASL, which applies a causal debiasing regression to remove the first tokens' confounding effect. Experiments across four LLMs and five evaluation benchmarks demonstrate the effectiveness of our approach in mitigating the step length confounding problem.

顶级标签: llm model training data
详细标签: data selection reasoning step length confounding supervised fine-tuning log probability 或 搜索:

论大语言模型推理数据选择中的步骤长度混淆问题 / On the Step Length Confounding in LLM Reasoning Data Selection


1️⃣ 一句话总结

这篇论文发现,在为大语言模型筛选高质量推理训练数据时,常用的基于‘自然度’的评分方法会偏向步骤更长而非质量更高的样本,并提出了两种新方法来纠正这种偏差,从而选出更优的训练数据。

源自 arXiv: 2604.06834