Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space

📄 Abstract - Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space

The lottery ticket hypothesis posits that dense networks contain sparse subnetworks, ``winning tickets,'' that, when rewound to their initial weights and retrained in isolation, match the performance of the full model. We ask a more mechanistic question: what internal object does a winning ticket preserve? We work in a combinatorial, clause-structured toy setting that admits an interpretable feature-space representation with well-defined combinatorial distances between features. We show that winning tickets in weight space correspond to precursor locations in feature space that are already near, at initialization, to the final feature-channel codes. Dense SGD resolves these locations through structured selection: proximal locations either converge to final codes or are rejected, with rejection concentrated at more crowded neurons, implicating competition under superposition. A winning ticket is thus a family of compatible code locations that jointly balance proximity to final codes with low inter-feature interference. Sparse retraining often re-expresses the same clause/template family on a different row, so the preserved object is family-level rather than microscopic row identity. We validate this account with lightweight probes based on feature-space distance and motion; in our setting, these probes frequently outperform established weight-based ticket discovery methods in both accuracy and exact code recovery. Although these findings are grounded in a toy setting, they suggest that the lottery ticket structure is governed by hidden feature-space geometry rather than weight-space subnetwork identity.

玩具组合可解释性模型揭示早期特征空间中的中奖彩票 / Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space

1️⃣ 一句话总结

该研究通过一个简化的玩具模型，揭示了神经网络中的“中奖彩票”（稀疏子网络）实际上对应的是初始化时就已经接近最终特征编码的“前驱位置”，其本质是特征空间中的几何结构（而非权重空间中的特定子网络），并通过特征距离等轻量级探针验证了这一发现。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要