大语言模型如何遵循指令:是技能协调,而非通用机制 / How LLMs Follow Instructions: Skillful Coordination, Not a Universal Mechanism
1️⃣ 一句话总结
这篇论文研究发现,经过指令微调的大语言模型并非通过一个通用的‘指令遵循’机制来工作,而是更像一个‘协调员’,能够灵活地组合调用其已有的各项语言技能来完成任务。
Instruction tuning is commonly assumed to endow language models with a domain-general ability to follow instructions, yet the underlying mechanism remains poorly understood. Does instruction-following rely on a universal mechanism or compositional skill deployment? We investigate this through diagnostic probing across nine diverse tasks in three instruction-tuned models. Our analysis provides converging evidence against a universal mechanism. First, general probes trained across all tasks consistently underperform task-specific specialists, indicating limited representational sharing. Second, cross-task transfer is weak and clustered by skill similarity. Third, causal ablation reveals sparse asymmetric dependencies rather than shared representations. Tasks also stratify by complexity across layers, with structural constraints emerging early and semantic tasks emerging late. Finally, temporal analysis shows constraint satisfaction operates as dynamic monitoring during generation rather than pre-generation planning. These findings indicate that instruction-following is better characterized as skillful coordination of diverse linguistic capabilities rather than deployment of a single abstract constraint-checking process.
大语言模型如何遵循指令:是技能协调,而非通用机制 / How LLMs Follow Instructions: Skillful Coordination, Not a Universal Mechanism
这篇论文研究发现,经过指令微调的大语言模型并非通过一个通用的‘指令遵循’机制来工作,而是更像一个‘协调员’,能够灵活地组合调用其已有的各项语言技能来完成任务。
源自 arXiv: 2604.06015