Abstract:
Generally, obtaining theoretical guarantees for neural networks training appears to be a hard problem.
Recent research has been focused on studying this problem in the limit of infinite width and two different theories have been developed: mean-field (MF) limit theory and kernel limit theory.
We propose a general framework that provides a link between these seemingly distinct limit theories.
Our framework out of the box gives rise to a discrete-time MF limit — a setup that to the best of our knowledge was not previously explored in literature.
We prove a convergence theorem for it and show that it provides a more reasonable approximation for finite-width nets compared to NTK limit if learning rates are not very small.
Also, our framework suggests a different type of infinite-width limits, not covered by both MF and kernel limit theories.
We show that for networks with more than two hidden layers RMSProp training has a non-trivial MF limit, but GD training does not have one.
Overall, our framework demonstrates that both MF and NTK limits have considerable limitations in approximating finite-sized neural nets, indicating the need for designing more accurate infinite-width approximations for them.