菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-19
📄 Abstract - Box Maze: A Process-Control Architecture for Reliable LLM Reasoning

Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at the behavioral level and may lack explicit architectural mechanisms for enforcing reasoning process integrity. This paper proposes the Box Maze framework, a conceptual process-control architecture that decomposes LLM reasoning into three explicit layers: memory grounding, structured inference, and boundary enforcement. We introduce preliminary simulation-based evaluation involving progressive boundary erosion scenarios across multiple heterogeneous LLM systems (DeepSeek-V3, Doubao, Qwen). Results from n=50 adversarial scenarios suggest that explicit cognitive control layers may improve consistency in boundary maintenance, with architectural constraints reducing boundary failure rates from approximately 40% (baseline RLHF) to below 1% under adversarial conditions. While current validation is simulation-based, these preliminary results indicate that process-level control may offer a promising direction for improving reliability in large language model reasoning.

顶级标签: llm systems model evaluation
详细标签: reasoning reliability process-control architecture adversarial robustness cognitive control layers boundary enforcement 或 搜索:

盒子迷宫:一种用于可靠大语言模型推理的过程控制架构 / Box Maze: A Process-Control Architecture for Reliable LLM Reasoning


1️⃣ 一句话总结

这篇论文提出了一种名为‘盒子迷宫’的新架构,通过将大语言模型的推理过程分解为三个受控步骤来防止其‘胡说八道’,初步测试显示它能将模型在对抗性攻击下的出错率从约40%大幅降低到1%以下。

源自 arXiv: 2603.19182