Steering Vector Fields for Context-Aware Inference-Time Control in Large Language Models

📄 Abstract - Steering Vector Fields for Context-Aware Inference-Time Control in Large Language Models

Steering vectors (SVs) offer a lightweight way to control large language models (LLMs) at inference time by shifting hidden activations, providing a practical middle ground between prompting and fine-tuning. Yet SVs can be unreliable in practice. Some concepts are unsteerable, and even when steering helps on average it can backfire for a non-trivial fraction of inputs. Reliability also degrades in long-form generation and multi-attribute steering. We take a geometric view of these failures. A static SV applies the same update vector everywhere in representation space, implicitly assuming that the concept-improving direction is constant across contexts. When the locally effective direction varies with the current activation, a single global vector can become misaligned, which yields weak or reversed effects. Guided by this perspective, we propose Steering Vector Fields (SVF), which learns a differentiable concept scoring function whose local gradient defines the steering direction at each activation, making interventions explicitly context-dependent. This formulation supports coordinated multi-layer interventions in a shared, aligned concept space, and enables efficient long-form and multi-attribute control within a unified framework. Across multiple LLMs and steering tasks, SVF delivers stronger and more reliable control, improving the practicality of inference-time steering.

用于大语言模型上下文感知推理时控制的导向向量场 / Steering Vector Fields for Context-Aware Inference-Time Control in Large Language Models

1️⃣ 一句话总结

这篇论文提出了一种名为‘导向向量场’的新方法，通过让控制信号根据当前对话内容动态调整，解决了现有大语言模型在推理时微调控制中效果不稳定、不可靠的问题，从而实现了更精准、更可靠的模型行为引导。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要