📄
Abstract - Flatness and Generalization: Learning Multi-Index Models with Homogeneous Neural Networks
A common heuristic used to explain the generalization of first-order gradient methods on non-convex neural networks is that "flat interpolators generalize well" (Hochreiter and Schmidhuber, 1994; Keskar et al., 2017), where flatness can be measured by the trace of the Hessian of the empirical loss. However, Dinh et al. 2017) showed that, using symmetry of the network that can change flatness while keeping the population and empirical losses unchanged, any interpolator can be made sharper or flatter. This result makes the earlier heuristic statement vacuous. In this paper, we show that for learning an unknown multi-index model with $2$-layer non-convex homogeneous neural networks, there is a connection between flatness and generalization, despite the existence of symmetries. This connection pertains to the "flattest" interpolators, i.e., the interpolators that have orderwise minimum flatness among all interpolators. First, we show that there exists a natural class of non-generalizing interpolators whose flatness cannot be made closer to the flattest possible, even using symmetries. Second, we show that for data generated by a sum of single-index models, if the approximation error and label noise are low, any flattest interpolator achieves small population loss, i.e., the flattest interpolators always generalize. This establishes a direct link between flatness and generalization which applies to a large class of activations and realistic data distributions.
平坦性与泛化:用齐次神经网络学习多指标模型 /
Flatness and Generalization: Learning Multi-Index Models with Homogeneous Neural Networks
1️⃣ 一句话总结
本文针对两层齐次神经网络在学习未知多指标模型时,发现虽然网络对称性可以让某些解变平坦或变尖锐,但“最平坦”的解(即所有解中平坦度最小的解)与泛化性能之间仍存在可靠联系:一方面,存在一类无法通过对称性变平坦的非泛化解;另一方面,在数据由多单指标模型生成且误差较小时,任何最平坦的解都能取得很好的泛化效果,从而为“平坦解泛化好”这一经验法则提供了理论支持。