Abstract:
Arguably, the two most popular accelerated or momentum-based optimization
methods are Nesterov's accelerated gradient and Polyaks's heavy ball,
both corresponding to different discretizations of a particular second order
differential equation with a friction term. Such connections with
continuous-time dynamical systems have been instrumental in demystifying
acceleration phenomena in optimization.
Here we study structure-preserving discretizations for a certain class of
dissipative (conformal) Hamiltonian systems, allowing us to analyze the
symplectic structure of both Nesterov and heavy ball, besides providing
several new insights into these methods.
Moreover, we propose a new algorithm based on a dissipative relativistic
system that normalizes the momentum and may result in more stable/faster
optimization. Importantly, such a method generalizes both Nesterov and heavy
ball, each being recovered as distinct limiting cases, and has potential
advantages at no additional cost.