Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents

📄 Abstract - Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents

Persistent language-model agents increasingly combine tool use, tiered memory, reflective prompting, and runtime adaptation. In such systems, behavior is shaped not only by current prompts but by mutable internal conditions that influence future action. This paper introduces layered mutability, a framework for reasoning about that process across five layers: pretraining, post-training alignment, self-narrative, memory, and weight-level adaptation. The central claim is that governance difficulty rises when mutation is rapid, downstream coupling is strong, reversibility is weak, and observability is low, creating a systematic mismatch between the layers that most affect behavior and the layers humans can most easily inspect. I formalize this intuition with simple drift, governance-load, and hysteresis quantities, connect the framework to recent work on temporal identity in language-model agents, and report a preliminary ratchet experiment in which reverting an agent's visible self-description after memory accumulation fails to restore baseline behavior. In that experiment, the estimated identity hysteresis ratio is 0.68. The main implication is that the salient failure mode for persistent self-modifying agents is not abrupt misalignment but compositional drift: locally reasonable updates that accumulate into a behavioral trajectory that was never explicitly authorized.

分层可变性：持久性自我修改智能体的连续性与治理 / Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents

1️⃣ 一句话总结

这篇论文提出了一个‘分层可变性’框架来分析能随时间自我修改的AI智能体，指出其主要风险并非突然失控，而是由许多看似合理的微小改变累积而成的、未经授权的行为轨迹漂移。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要