Abstract:
We study a general class of bilevel optimization problems, in which the upper-level objective is defined via the solution of a fixed point equation. Important instances arising in machine learning include hyper-parameter optimization, meta-learning, graph and recurrent neural networks. Typically the gradient of the upper-level objective is not known explicitly or is hard to compute exactly, which has raised the interest in approximation methods. We investigate two popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. We present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings.