Abstract:
We demonstrate two new important properties of the path-norm regularizer for shallow neural networks. First, despite its non-smoothness and non-convexity it allows a closed form proximal operator which can be efficiently computed, allowing the use of stochastic proximal-gradient-type methods for regularized empirical risk minimization. Second, it provides an upper bound on the Lipschitz constant of the network, which is tighter than the trivial layer-wise product of Lipschitz constants, motivating its use for training networks robust to adversarial perturbations. Finally, in practical experiments we show that it provides a better robustness-accuracy trade-off when compared to $\ell_1$-norm regularization or training with a layer-wise constrain of the Lipschitz constant.