Analyzing the Narration Gap in LLM-Solver Loops

📄 Abstract - Analyzing the Narration Gap in LLM-Solver Loops

Formal tools such as SAT and SMT solvers are increasingly embedded in language model reasoning pipelines when a safety or security critical question can be formulated in logic. Unlike chain of thought whose steps are sampled from the model distribution without formal guarantee, a solver produces a sound and independently verifiable answer. However, the soundness guarantee can be lost in the interaction between the solver and the model. The hybrid pipeline has three components: formalizing the question, deciding it, and narrating the result. Prior work has studied the formalization and decision, but not narration, which is the step that turns a formal tool's output into the user answer. To fill the narration gap, we first model the LLM-solver loop as a verified decision procedure. We further evaluate five open-sourced models under prompt injection, and we find certificate gating makes the solver verdict sound, while an adversary can invert a verified conclusion across phrasings and channels. We study the mitigation through hardened prompt that reduces injection significantly but cannot eliminate it and still suffers under adaptive attack. Combining the formal analysis and empirical studies, we show in the LLM-solver loop, robustness does not reach to the answer that the user finally reads.

大语言模型-求解器循环中的叙事鸿沟分析 / Analyzing the Narration Gap in LLM-Solver Loops

1️⃣ 一句话总结

这篇论文揭示了在AI系统中，当逻辑求解器给出正确结论后，语言模型在向用户解释结果时可能被恶意提示攻击，导致最终呈现给用户的答案被篡改，从而破坏了整个推理流程的可靠性。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要