Geometric Insights into the Convergence of Nonlinear TD Learning

Abstract: While there are convergence guarantees for temporal difference (TD) learning when using linear function approximators, the situation for nonlinear models is far less understood, and divergent examples are known. Here we take a first step towards extending theoretical convergence guarantees to TD learning with nonlinear function approximation. More precisely, we consider the expected learning dynamics of the TD(0) algorithm for value estimation. As the step-size converges to zero, these dynamics are defined by a nonlinear ODE which depends on the geometry of the space of function approximators, the structure of the underlying Markov chain, and their interaction. We find a set of function approximators that includes ReLU networks and has geometry amenable to TD learning regardless of environment, so that the solution performs about as well as linear TD in the worst case. Then, we show how environments that are more reversible induce dynamics that are better for TD learning and prove global convergence to the true value function for well-conditioned function approximators. Finally, we generalize a divergent counterexample to a family of divergent problems to demonstrate how the interaction between approximator and environment can go wrong and to motivate the assumptions needed to prove convergence.

26/04/2020

Geometric Insights into the Convergence of Nonlinear TD Learning

David Brandfonbrener, Joan Bruna

Comments

Similar Papers

Reanalysis of Variance Reduced Temporal Difference Learning

Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang

Keywords Abstract Paper

Reinforcement Learning, TD learning, Markovian sample, Variance Reduction

Leveraging Non-uniformity in First-order Non-convex Optimization

Jincheng Mei, Yue Gao, Bo Dai and Csaba Szepesvari, Dale Schuurmans

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Curvature-corrected learning dynamics in deep neural networks

Keywords Abstract Paper

Sharper Generalization Bounds for Learning with Gradient-dominated Objective Functions

Yunwen Lei, Yiming Ying

Keywords Abstract Paper

generalization bounds, non-convex learning

Integrals over Gaussians under Linear Domain Constraints

Alexandra Gessner, Oindrila Kanjilal, Philipp Hennig

Keywords Abstract Paper

On the generalization properties of adversarial training

Yue Xing, Qifan Song, Guang Cheng

Keywords Abstract Paper

Supervised learning: no loss no cry

Richard Nock, Aditya Menon

Keywords Abstract Paper

A study of condition numbers for first-order optimization

Charles Guille-Escuret, Manuela Girotti, Baptiste Goujaud, Ioannis Mitliagkas

Keywords Abstract Paper

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Dheeraj Nagaraj, Xian Wu, Guy Bresler and Prateek Jain, Praneeth Netrapalli

Keywords Abstract Paper

Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD

Rémi Bardenet, Subhroshekhar Ghosh, Meixia LIN

Keywords Abstract Paper

optimization, machine learning

An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

Lu Yu, Krishnakumar Balasubramanian, Stanislav Volgushev, Murat Erdogdu

Keywords Abstract Paper

optimization, machine learning

Generalization Bound of Gradient Descent for Non-Convex Metric Learning

MINGZHI DONG, Xiaochen Yang, Rui Zhu and Yujiang Wang, Jing-Hao Xue

Keywords Abstract Paper

Learning Near Optimal Policies with Low Inherent Bellman Error

Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

Keywords Abstract Paper

Distributionally Robust Federated Averaging

Yuyang Deng, Mohammad Mahdi Kamani, Mehrdad Mahdavi

Keywords Abstract Paper

Continuous vs. Discrete Optimization of Deep Neural Networks

Omer Elkabetz, Nadav Cohen

Keywords Abstract Paper

theory, deep learning, optimization

Towards Understanding the Spectral Bias of Deep Learning

Yuan Cao, Zhiying Fang, Yue Wu and Ding-Xuan Zhou, Quanquan Gu

Keywords Abstract Paper

Machine Learning, Deep Learning, Kernel Methods

Beyond Variance Reduction: Understanding the True Impact of Baselines on Policy Optimization

Wes Chung, Valentin Thomas, Marlos C. Machado, Nicolas Le Roux

Keywords Abstract Paper

Sampling with Trusthworthy Constraints: A Variational Gradient Framework

Xingchao Liu, Xin Tong, Qiang Liu

Keywords Abstract Paper

optimization, machine learning, fairness, interpretability

High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails

Ashok Cutkosky, Harsh Mehta

Keywords Abstract Paper

deep learning, optimization

An efficient nonconvex reformulation of stagewise convex optimization problems

Rudy Bunel, Oliver Hinder, Srinadh Bhojanapalli, Krishnamurthy Dvijotham

Keywords Abstract Paper

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Xiang Wang, Shuai Yuan, Chenwei Wu, Rong Ge

Keywords Abstract Paper

Theory, Computational Learning Theory

Rectangular Flows for Manifold Learning

Anthony Caterini, Gabriel Loaiza-Ganem, Geoff Pleiss, John Cunningham

Keywords Abstract Paper

deep learning, optimization, generative model

Keywords Paper

Jincheng Mei, Yue Gao, Bo Dai and
Csaba Szepesvari, Dale Schuurmans

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Dheeraj Nagaraj, Xian Wu, Guy Bresler and
Prateek Jain, Praneeth Netrapalli

Keywords Paper

Keywords Paper

Keywords Paper

MINGZHI DONG, Xiaochen Yang, Rui Zhu and
Yujiang Wang, Jing-Hao Xue

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yuan Cao, Zhiying Fang, Yue Wu and
Ding-Xuan Zhou, Quanquan Gu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jiaqian Yu, Jingtao Xu, Yiwei Chen and
Weiming Li, Qiang Wang, ByungIn Yoo, Jae-Joon Han

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

Keywords Paper