Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Abstract: Distributional reinforcement learning (DRL) estimates the distribution over future returns instead of the mean to more efficiently capture the intrinsic uncertainty of MDPs. However, batch-based DRL algorithms cannot guarantee the non-decreasing property of learned quantile curves especially at the early training stage, leading to abnormal distribution estimates and reduced model interpretability. To address these issues, we introduce a general DRL framework by using non-crossing quantile regression to ensure the monotonicity constraint within each sampled batch, which can be incorporated with any well-known DRL algorithm. We demonstrate the validity of our method from both the theory and model implementation perspectives. Experiments on Atari 2600 Games show that some state-of-art DRL algorithms with the non-crossing modification can significantly outperform their baselines in terms of faster convergence speeds and better testing performance. In particular, our method can effectively recover the distribution information and thus dramatically increase the exploration efficiency when the reward space is extremely sparse.

18/07/2021

Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Fan Zhou, Jianing Wang, Xingdong Feng

Comments

Similar Papers

Generalizable Episodic Memory for Deep Reinforcement Learning

Hao Hu, Jianing Ye, Guangxiang Zhu and Zhizhou Ren, Chongjie Zhang

Keywords Abstract Paper

Data-Efficient Reinforcement Learning with Self-Predictive Representations

Max Schwarzer, Ankesh Anand, Rishab Goel and R Devon Hjelm, Aaron Courville, Philip Bachman

Keywords Abstract Paper

Representation Learning, Self-Supervised Learning, Reinforcement Learning, Sample Efficiency

Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

Yijie Guo, Jongwook Choi, Marcin Moczulski and Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee

Keywords Abstract Paper

Return-Based Contrastive Representation Learning for Reinforcement Learning

Guoqing Liu, Chuheng Zhang, Li Zhao and Tao Qin, Jinhua Zhu, Li Jian, Nenghai Yu, Tie-Yan Liu

Keywords Abstract Paper

reinforcement learning, auxiliary task, contrastive learning, representation learning

Hindsight Trust Region Policy Optimization

Hanbo Zhang, Site Bai, Xuguang Lan and David Hsu, Nanning Zheng

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning

Disagreement-Regularized Imitation Learning

Kiante Brantley, Wen Sun, Mikael Henaff

Keywords Abstract Paper

imitation learning, reinforcement learning, uncertainty

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

Gal Dalal, Assaf Hallak, Steven Dalton and iuri frosio, Shie Mannor, Gal Chechik

Keywords Abstract Paper

theory, reinforcement learning and planning

Longitudinal Deep Kernel Gaussian Process Regression

Junjie Liang, Yanting Wu, Dongkuan Xu, Vasant G Honavar

Keywords Abstract Paper

Munchausen Reinforcement Learning

Nino Vieillard, Olivier Pietquin, Matthieu Geist

Keywords Abstract Paper

Distributional Reinforcement Learning via Moment Matching

Thanh Nguyen-Tang, Sunil Gupta, Svetha Venkatesh

Keywords Abstract Paper

ConQUR: Mitigating Delusional Bias in Deep Q-Learning

DiJia Su, Jayden Ooi, Tyler Lu and Dale Schuurmans, Craig Boutilier

Keywords Abstract Paper

Improving KernelSHAP: Practical shapley value estimation using linear regression

Ian Covert, Su-In Lee

Keywords Abstract Paper

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

Daniel Brown, Scott Niekum, Russell Coleman, Ravi Srinivasan

Keywords Abstract Paper

Emphatic Algorithms for Deep Reinforcement Learning

Ray Jiang, Tom Zahavy, Zhongwen Xu and Adam White, Matteo Hessel, Charles Blundell, Hado van Hasselt

Keywords Abstract Paper

Reinforcement Learning and Planning, Deep RL

Fast Task Inference with Variational Intrinsic Successor Features

Steven Hansen, Will Dabney, Andre Barreto and David Warde-Farley, Tom Van de Wiele, Volodymyr Mnih

Keywords Abstract Paper

Reinforcement Learning, Variational Intrinsic Control, Successor Features

MOPO: Model-based Offline Policy Optimization

Tianhe (Kevin) Yu, Garrett Thomas, Lantao Yu and Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Keywords Abstract Paper

Deep Reinforcement Learning at the Edge of the Statistical Precipice

Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro and Aaron Courville, Marc Bellemare

Keywords Abstract Paper

Efficient Statistical Tests: A Neural Tangent Kernel Approach

Sheng Jia, Ehsan Nezhadarya, Yuhuai Wu, Jimmy Ba

Keywords Abstract Paper

Generative Semantic Hashing Enhanced via Boltzmann Machines

Lin Zheng, Qinliang Su, Dinghan Shen, Changyou Chen

Keywords Abstract Paper

Generative Hashing, large-scale retrieval, training, Boltzmann Machines

Convex Regularization in Monte-Carlo Tree Search

Tuan Q Dam, Carlo D'Eramo, Jan Peters, Joni Pajarinen

Keywords Abstract Paper

Learning and Planning in Average-Reward Markov Decision Processes

Yi Wan, Abhishek Naik, Richard Sutton

Keywords Abstract Paper

On the Estimation Bias in Double Q-Learning

Zhizhou Ren, Guangxiang Zhu, Hao Hu and Beining Han, Jianglun Chen, Chongjie Zhang

Keywords Abstract Paper

Task-Robust Model-Agnostic Meta-Learning

Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

Hao Hu, Jianing Ye, Guangxiang Zhu and
Zhizhou Ren, Chongjie Zhang

Keywords Paper

Max Schwarzer, Ankesh Anand, Rishab Goel and
R Devon Hjelm, Aaron Courville, Philip Bachman

Keywords Paper

Yijie Guo, Jongwook Choi, Marcin Moczulski and
Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee

Keywords Paper

Guoqing Liu, Chuheng Zhang, Li Zhao and
Tao Qin, Jinhua Zhu, Li Jian, Nenghai Yu, Tie-Yan Liu

Keywords Paper

Hanbo Zhang, Site Bai, Xuguang Lan and
David Hsu, Nanning Zheng

Keywords Paper

Keywords Paper

Gal Dalal, Assaf Hallak, Steven Dalton and
iuri frosio, Shie Mannor, Gal Chechik

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

DiJia Su, Jayden Ooi, Tyler Lu and
Dale Schuurmans, Craig Boutilier

Keywords Paper

Keywords Paper

Keywords Paper

Ray Jiang, Tom Zahavy, Zhongwen Xu and
Adam White, Matteo Hessel, Charles Blundell, Hado van Hasselt

Keywords Paper

Steven Hansen, Will Dabney, Andre Barreto and
David Warde-Farley, Tom Van de Wiele, Volodymyr Mnih

Keywords Paper

Tianhe (Kevin) Yu, Garrett Thomas, Lantao Yu and
Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Keywords Paper

Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro and
Aaron Courville, Marc Bellemare

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhizhou Ren, Guangxiang Zhu, Hao Hu and
Beining Han, Jianglun Chen, Chongjie Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Xiaosen Wang, Jiadong Lin, Han Hu and
Jingdong Wang, Kun He

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Nicolas Papernot, Abhradeep Thakurta, Shuang Song and
Steve Chien, Úlfar Erlingsson

Keywords Paper

Andres Potapczynski, Luhuan Wu, Dan Biderman and
Geoff Pleiss, John Cunningham

Keywords Paper

Keywords Paper

Tatjana Chavdarova, Matteo Pagliardini, Sebastian Stich and
François Fleuret, Martin Jaggi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and
Michael Arbel, Michael Jordan

Keywords Paper