菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-23
📄 Abstract - LongCat-Flash-Thinking-2601 Technical Report

We introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability. LongCat-Flash-Thinking-2601 achieves state-of-the-art performance among open-source models on a wide range of agentic benchmarks, including agentic search, agentic tool use, and tool-integrated reasoning. Beyond benchmark performance, the model demonstrates strong generalization to complex tool interactions and robust behavior under noisy real-world environments. Its advanced capability stems from a unified training framework that combines domain-parallel expert training with subsequent fusion, together with an end-to-end co-design of data construction, environments, algorithms, and infrastructure spanning from pre-training to post-training. In particular, the model's strong generalization capability in complex tool-use are driven by our in-depth exploration of environment scaling and principled task construction. To optimize long-tailed, skewed generation and multi-turn agentic interactions, and to enable stable training across over 10,000 environments spanning more than 20 domains, we systematically extend our asynchronous reinforcement learning framework, DORA, for stable and efficient large-scale multi-environment training. Furthermore, recognizing that real-world tasks are inherently noisy, we conduct a systematic analysis and decomposition of real-world noise patterns, and design targeted training procedures to explicitly incorporate such imperfections into the training process, resulting in improved robustness for real-world applications. To further enhance performance on complex reasoning tasks, we introduce a Heavy Thinking mode that enables effective test-time scaling by jointly expanding reasoning depth and width through intensive parallel thinking.

顶级标签: llm agents model training
详细标签: mixture of experts agentic reasoning reinforcement learning tool use robustness 或 搜索:

LongCat-Flash-Thinking-2601 技术报告 / LongCat-Flash-Thinking-2601 Technical Report


1️⃣ 一句话总结

这篇论文介绍了一个名为LongCat-Flash-Thinking-2601的先进开源大模型,它通过创新的混合专家架构和统一的训练框架,在理解和执行复杂任务、使用多种工具方面表现出色,并且特别擅长处理现实世界中混乱、多步骤的交互场景。

源自 arXiv: 2601.16725