Regularization matters: A nonparametric perspective on overparametrized neural network

13/04/2021

Regularization matters: A nonparametric perspective on overparametrized neural network

Tianyang Hu, Wenjia Wang, Cong Lin, Guang Cheng

Keywords:

Abstract: Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises. We establish a lower bound on the L2 estimation error with respect to the GD iteration, which is away from zero without a delicate choice of early stopping. In turn, through a comprehensive analysis of L2-regularized GD trajectories, we prove that for overparametrized one-hidden-layer ReLU neural network with the L2 regularization: (1) the output is close to that of the kernel ridge regression with the corresponding neural tangent kernel; (2) minimax optimal rate of the L2 estimation error is achieved. Numerical experiments confirm our theory and further demonstrate that the L2 regularization approach improves the training robustness and works for a wider range of neural networks.

Regularization matters: A nonparametric perspective on overparametrized neural network

Tianyang Hu, Wenjia Wang, Cong Lin, Guang Cheng

Comments

Similar Papers