PragReST:面向语用语言理解的自我增强反事实推理框架 / PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding
1️⃣ 一句话总结
该论文提出了一种名为PragReST的无监督训练方法,通过自动生成反事实推理数据,让大语言模型学会理解对话中的言外之意,从而在不依赖人工标注或更强模型的情况下,显著提升模型在语用推理任务上的表现,且不影响其通用知识能力。
Natural language understanding often depends on meanings that are implied rather than explicitly stated, requiring pragmatic reasoning. Despite strong performance on math and logical reasoning, large language models (LLMs) still struggle with making pragmatic inferences, often choosing literal interpretations. To improve LLM pragmatic reasoning, we introduce PragReST, a self-supervised framework that constructs pragmatic QA data, generates counterfactual reasoning traces, and trains models to internalize them through supervised fine-tuning and reinforcement learning, without human-labeled training data or distillation from a stronger teacher. Across four pragmatic benchmarks (PragMega, Ludwig, MetoQA, and AltPrag), PragReST improves over backbone models, task-specific pragmatic tuning baselines, and non-counterfactual variants of the same pipeline. On accuracy-based benchmarks, PragReST improves over the instruct backbone by 5.37 and 5.50% (absolute) for Qwen3-8B and Qwen3-14B, respectively. Our error analysis and ablations underscore the importance of counterfactual reasoning: PragReST primarily reduces errors caused by failures to contrast observed utterances with plausible alternatives, and removing counterfactual reasoning substantially reduces performance. Moreover, our training preserves out-of-domain performance on general-knowledge and mathematical reasoning benchmarks.
PragReST:面向语用语言理解的自我增强反事实推理框架 / PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding
该论文提出了一种名为PragReST的无监督训练方法,通过自动生成反事实推理数据,让大语言模型学会理解对话中的言外之意,从而在不依赖人工标注或更强模型的情况下,显著提升模型在语用推理任务上的表现,且不影响其通用知识能力。
源自 arXiv: 2606.18624