Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

📄 Abstract - Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

We introduce Native Parallel Reasoner (NPR), a teacher-free framework that enables Large Language Models (LLMs) to self-evolve genuine parallel reasoning capabilities. NPR transforms the model from sequential emulation to native parallel cognition through three key innovations: 1) a self-distilled progressive training paradigm that transitions from ``cold-start'' format discovery to strict topological constraints without external supervision; 2) a novel Parallel-Aware Policy Optimization (PAPO) algorithm that optimizes branching policies directly within the execution graph, allowing the model to learn adaptive decomposition via trial and error; and 3) a robust NPR Engine that refactors memory management and flow control of SGLang to enable stable, large-scale parallel RL training. Across eight reasoning benchmarks, NPR trained on Qwen3-4B achieves performance gains of up to 24.5% and inference speedups up to 4.6x. Unlike prior baselines that often fall back to autoregressive decoding, NPR demonstrates 100% genuine parallel execution, establishing a new standard for self-evolving, efficient, and scalable agentic reasoning.

原生并行推理器：通过自蒸馏强化学习实现并行推理 / Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

1️⃣ 一句话总结

这篇论文提出了一个名为NPR的无教师框架，它让大语言模型通过自我进化的方式，从模仿串行思考转变为真正具备并行推理能力，从而在多个任务上显著提升了性能和推理速度。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要