大语言模型指令遵循的神经符号验证 / Neuro-Symbolic Verification on Instruction Following of LLMs
1️⃣ 一句话总结
这篇论文提出了一个名为NSVIF的通用验证框架,它通过将用户指令建模为约束条件,并融合逻辑推理与语义分析,来检测大语言模型的输出是否遵循了指令,从而帮助提升模型的安全性和可靠性。
A fundamental problem of applying Large Language Models (LLMs) to important applications is that LLMs do not always follow instructions, and violations are often hard to observe or check. In LLM-based agentic workflows, such violations can propagate and amplify along reasoning chains, causing task failures and system incidents. This paper presents NSVIF, a neuro-symbolic framework for verifying whether an LLM's output follows the instructions used to prompt the LLM. NSVIF is a universal, general-purpose verifier; it makes no assumption about the instruction or the LLM. NSVIF formulates instruction-following verification as a constraint-satisfaction problem by modeling user instructions as constraints. NSVIF models both logical and semantic constraints; constraint solving is done by a unified solver that orchestrates logical reasoning and semantic analysis. To evaluate NSVIF, we develop VIFBENCH, a new benchmark for instruction-following verifiers with fine-grained data labels. Experiments show that NSVIF significantly outperforms LLM-based approaches and provides interpretable feedback. We also show that feedback from NSVIF helps improve LLMs' instruction-following capability without post-training.
大语言模型指令遵循的神经符号验证 / Neuro-Symbolic Verification on Instruction Following of LLMs
这篇论文提出了一个名为NSVIF的通用验证框架,它通过将用户指令建模为约束条件,并融合逻辑推理与语义分析,来检测大语言模型的输出是否遵循了指令,从而帮助提升模型的安全性和可靠性。
源自 arXiv: 2601.17789