12/07/2020

Gradient Temporal-Difference Learning with Regularized Corrections

Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gutpa, Adam White, Martha White

Keywords: Reinforcement Learning - General

Abstract: Value function learning remains a critical component of many reinforcement learning systems. Many algorithms are based on temporal difference (TD) updates, which have well-documented divergence issues, even though potentially sound alternatives exist like Gradient TD. Unsound approaches like Q-learning and TD remain popular because divergence seems rare in practice and these algorithms typically perform well. However, recent work with large neural network learning systems reveals that instability is more common than previously thought. Practitioners face a difficult dilemma: choose an easy to use and performant TD method, or a more complex algorithm that is more sound but harder to tune, less sample efficient, and underexplored with control. In this paper, we introduce a new method called TD with Regularized Corrections (TDRC), that attempts to balance ease of use, soundness, and performance. It behaves as well as TD, when TD performs well, but is sound even in cases where TD diverges. We characterize the expected update for TDRC, and show that it inherits soundness guarantees from Gradient TD, and converges to the same solution as TD. Empirically, TDRC exhibits good performance and low parameter sensitivity across several problems.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers