菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-16
📄 Abstract - Ablate and Rescue: A Causal Analysis of Residual Stream Hyper-Connections

Multi-stream transformer architectures have recently been proposed as a promising direction for managing representation collapse and the vanishing gradient problem for residual connections, yet their internal mechanisms remain unexplored. In particular, the recently introduced Manifold-Constrained Hyper-Connections (mHC) architecture posits multiple residual streams with constrained interaction, but lacks in-depth mechanistic analysis. We present the first open-source mHC language model (this https URL) and analyze the multiple-stream architecture with a suite of representation-level metrics and causal interventions to probe how parallel streams encode and utilize information. Specifically, we introduce a systematic stream ablation-and-rescue framework that enables direct causal comparison of residual streams during inference. Through targeted pairwise interventions and controlled recovery experiments, we distinguish functional redundancy from asymmetric utilization and reveal how information is distributed across streams beyond what is observable from representational similarity alone.

顶级标签: llm model training theory
详细标签: transformer architecture residual connections causal analysis representation learning model interpretability 或 搜索:

消融与救援:残差流超连接结构的因果分析 / Ablate and Rescue: A Causal Analysis of Residual Stream Hyper-Connections


1️⃣ 一句话总结

这篇论文通过一种新的‘消融与救援’因果分析方法,首次深入揭示了多流Transformer架构(特别是mHC模型)内部各并行残差流如何分工协作、避免信息冗余,从而提升模型性能的工作机制。

源自 arXiv: 2603.14833