菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-17
📄 Abstract - PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra

Current methods for personality control in Large Language Models rely on static prompting or expensive fine-tuning, failing to capture the dynamic and compositional nature of human traits. We introduce PERSONA, a training-free framework that achieves fine-tuning level performance through direct manipulation of personality vectors in activation space. Our key insight is that personality traits appear as extractable, approximately orthogonal directions in the model's representation space that support algebraic operations. The framework operates through three stages: Persona-Base extracts orthogonal trait vectors via contrastive activation analysis; Persona-Algebra enables precise control through vector arithmetic (scalar multiplication for intensity, addition for composition, subtraction for suppression); and Persona-Flow achieves context-aware adaptation by dynamically composing these vectors during inference. On PersonalityBench, our approach achieves a mean score of 9.60, nearly matching the supervised fine-tuning upper bound of 9.61 without any gradient updates. On our proposed Persona-Evolve benchmark for dynamic personality adaptation, we achieve up to 91% win rates across diverse model families. These results provide evidence that aspects of LLM personality are mathematically tractable, opening new directions for interpretable and efficient behavioral control.

顶级标签: llm model training model evaluation
详细标签: personality control activation vectors inference-time control vector algebra behavioral control 或 搜索:

PERSONA:基于激活向量代数的动态组合式推理时人格控制 / PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra


1️⃣ 一句话总结

这篇论文提出了一个名为PERSONA的新方法,它无需额外训练,仅通过在模型内部激活空间中直接操作代表不同人格特质的向量,就能像精细调优一样动态、灵活地控制大语言模型表现出特定或组合的人格特征。

源自 arXiv: 2602.15669