菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-23
📄 Abstract - I Dropped a Neural Net

A recent Dwarkesh Patel podcast with John Collison and Elon Musk featured an interesting puzzle from Jane Street: they trained a neural net, shuffled all 96 layers, and asked to put them back in order. Given unlabelled layers of a Residual Network and its training dataset, we recover the exact ordering of the layers. The problem decomposes into pairing each block's input and output projections ($48!$ possibilities) and ordering the reassembled blocks ($48!$ possibilities), for a combined search space of $(48!)^2 \approx 10^{122}$, which is more than the atoms in the observable universe. We show that stability conditions during training like dynamic isometry leave the product $W_{\text{out}} W_{\text{in}}$ for correctly paired layers with a negative diagonal structure, allowing us to use diagonal dominance ratio as a signal for pairing. For ordering, we seed-initialize with a rough proxy such as delta-norm or $\|W_{\text{out}}\|_F$ then hill-climb to zero mean squared error.

顶级标签: model training theory machine learning
详细标签: neural network architecture layer ordering residual networks dynamic isometry optimization puzzle 或 搜索:

我打乱了一个神经网络:如何从乱序的层中恢复其原始结构 / I Dropped a Neural Net


1️⃣ 一句话总结

这篇论文提出了一种方法,能够仅凭训练数据和被打乱顺序的神经网络各层,成功恢复出网络层的原始配对和排列顺序,解决了看似不可能的高复杂度搜索问题。

源自 arXiv: 2602.19845