菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-18
📄 Abstract - Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space

The lottery ticket hypothesis posits that dense networks contain sparse subnetworks, ``winning tickets,'' that, when rewound to their initial weights and retrained in isolation, match the performance of the full model. We ask a more mechanistic question: what internal object does a winning ticket preserve? We work in a combinatorial, clause-structured toy setting that admits an interpretable feature-space representation with well-defined combinatorial distances between features. We show that winning tickets in weight space correspond to precursor locations in feature space that are already near, at initialization, to the final feature-channel codes. Dense SGD resolves these locations through structured selection: proximal locations either converge to final codes or are rejected, with rejection concentrated at more crowded neurons, implicating competition under superposition. A winning ticket is thus a family of compatible code locations that jointly balance proximity to final codes with low inter-feature interference. Sparse retraining often re-expresses the same clause/template family on a different row, so the preserved object is family-level rather than microscopic row identity. We validate this account with lightweight probes based on feature-space distance and motion; in our setting, these probes frequently outperform established weight-based ticket discovery methods in both accuracy and exact code recovery. Although these findings are grounded in a toy setting, they suggest that the lottery ticket structure is governed by hidden feature-space geometry rather than weight-space subnetwork identity.

顶级标签: machine learning interpretability
详细标签: lottery ticket hypothesis feature space mechanistic interpretability superposition weight space 或 搜索:

玩具组合可解释性模型揭示早期特征空间中的中奖彩票 / Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space


1️⃣ 一句话总结

该研究通过一个简化的玩具模型,揭示了神经网络中的“中奖彩票”(稀疏子网络)实际上对应的是初始化时就已经接近最终特征编码的“前驱位置”,其本质是特征空间中的几何结构(而非权重空间中的特定子网络),并通过特征距离等轻量级探针验证了这一发现。

源自 arXiv: 2605.17704