Softmax Deep Double Deterministic Policy Gradients

Abstract: A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can negatively affect the performance. Although the state-of-the-art Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm mitigates the overestimation issue, it can lead to a large underestimation bias. In this paper, we propose to use the Boltzmann softmax operator for value function estimation in continuous control. We first theoretically analyze the softmax operator in continuous action space. Then, we uncover an important property of the softmax operator in actor-critic algorithms, i.e., it helps to smooth the optimization landscape, which sheds new light on the benefits of the operator. We also design two new algorithms, Softmax Deep Deterministic Policy Gradients (SD2) and Softmax Deep Double Deterministic Policy Gradients (SD3), by building the softmax operator upon single and double estimators, which can effectively improve the overestimation and underestimation bias. We conduct extensive experiments on challenging continuous control tasks, and results show that SD3 outperforms state-of-the-art methods.

12/07/2020

Softmax Deep Double Deterministic Policy Gradients

Ling Pan, Qingpeng Cai, Longbo Huang

Comments

Similar Papers

Responsive Safety in Reinforcement Learning

Adam Stooke, Joshua Achiam, Pieter Abbeel

Keywords Abstract Paper

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and Christian Tjandraatmadja, Craig Boutilier

Keywords Abstract Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

Towards Robust Bisimulation Metric Learning

Mete Kemertas, Tristan Aumentado-Armstrong

Keywords Abstract Paper

reinforcement learning and planning, robustness, representation learning

Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability

Suraj Srinivas, François Fleuret

Keywords Abstract Paper

Interpretability, saliency maps, score-matching

Stabilizing Q Learning Via Soft Mellowmax Operator

Yaozhong Gan, Zhe Zhang, Xiaoyang Tan

Keywords Abstract Paper

Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies

Tim Seyde, Igor Gilitschenski, Wilko Schwarting and Bartolomeo Stellato, Martin Riedmiller, Markus Wulfmeier, Daniela Rus

Keywords Abstract Paper

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

On the Convergence of Smooth Regularized Approximate Value Iteration Schemes

Elena Smirnova, Elvis Dohmatob

Keywords Abstract Paper

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Abstract Paper

optimization, reinforcement learning and planning, bandits

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Abstract Paper

Reinforcement Learning and Planning, Multi-Agent RL

Extrapolation for Large-batch Training in Deep Learning

Tao LIN, Lingjing Kong, Sebastian Stich, Martin Jaggi

Keywords Abstract Paper

Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

Kaiqing Zhang, Xiangyuan Zhang, Bin Hu, Tamer Basar

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning

LAMDA: Label Matching Deep Domain Adaptation

Trung Le, Tuan Nguyen, Nhat Ho and Hung Bui, Dinh Phung

Keywords Abstract Paper

Theory, Deep learning Theory

Implementation Matters in Deep RL: A Case Study on PPO and TRPO

Logan Engstrom, Andrew Ilyas, Shibani Santurkar and Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

Keywords Abstract Paper

deep policy gradient methods, deep reinforcement learning, trpo, ppo

Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks

Jiadong Lin, Chuanbiao Song, Kun He and Liwei Wang, John E. Hopcroft

Keywords Abstract Paper

adversarial examples, adversarial attack, transferability, Nesterov accelerated gradient, scale invariance

Discovering symbolic policies with deep reinforcement learning

Mikel Landajuela Larma, Brenden Petersen, Sookyung Kim and Claudio Santiago, Ruben Glatt, Nathan Mundhenk, Jacob Pettit, Daniel Faissol

Keywords Abstract Paper

Reinforcement Learning and Planning, Deep RL

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Abstract Paper

deep learning, optimization, reinforcement learning and planning

Dynamic Regret of Policy Optimization in Non-Stationary Environments

Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

Keywords Abstract Paper

Model-based Reinforcement Learning for Continuous Control with Posterior Sampling

Ying Fan, Yifei Ming

Keywords Abstract Paper

Optimal Approximation - Smoothness Tradeoffs for Soft-Max Functions

Alessandro Epasto, Mohammad Mahdian, Vahab Mirrokni, Emmanouil Zampetakis

Keywords Abstract Paper

Average-Reward Reinforcement Learning with Trust Region Methods

Xiaoteng Ma, Xiaohang Tang, Li Xia and Jun Yang, Qianchuan Zhao

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning, Markov Decision Processes

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tim Seyde, Igor Gilitschenski, Wilko Schwarting and
Bartolomeo Stellato, Martin Riedmiller, Markus Wulfmeier, Daniela Rus

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and
Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Paper

Keywords Paper

Keywords Paper

Trung Le, Tuan Nguyen, Nhat Ho and
Hung Bui, Dinh Phung

Keywords Paper

Logan Engstrom, Andrew Ilyas, Shibani Santurkar and
Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

Keywords Paper

Jiadong Lin, Chuanbiao Song, Kun He and
Liwei Wang, John E. Hopcroft

Keywords Paper

Mikel Landajuela Larma, Brenden Petersen, Sookyung Kim and
Claudio Santiago, Ruben Glatt, Nathan Mundhenk, Jacob Pettit, Daniel Faissol

Keywords Paper

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Xiaoteng Ma, Xiaohang Tang, Li Xia and
Jun Yang, Qianchuan Zhao

Keywords Paper

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

Keywords Paper

Huan Zhang, Hongge Chen, Chaowei Xiao and
Bo Li, Mingyan Liu, Duane Boning, Cho-Jui Hsieh

Keywords Paper

Keywords Paper

Chen Chen, Hongyao Tang, Jianye Hao and
Wulong Liu, Zhaopeng Meng

Keywords Paper

Xiaobo Wang, Shuo Wang, Shifeng Zhang and
Cheng Chi, Tao Mei

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Kang Zhao, Sida Huang, Pan Pan and
Yinghan Li, Yingya Zhang, Zhenyu Gu, Yinghui Xu

Keywords Paper

Huajie Shao, Shuochao Yao, Dachun Sun and
Aston Zhang, Shengzhong Liu, Dongxin Liu, Jun Wang, Tarek Abdelzaher

Keywords Paper

Yanwei Fu, Chen Liu, Donghao Li and
Xinwei Sun, Jinshan ZENG, Yuan Yao

Keywords Paper

Keywords Paper

Kirill Struminsky, Artyom Gadetsky, Denis Rakitin and
Danil Karpushkin, Dmitry Vetrov

Keywords Paper

Keywords Paper