Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Abstract: Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.

06/12/2021

distance metric learning, offline/batch reinforcement learning, meta-reinforcement learning, contrastive learning, multi-task reinforcement learning

6:21

18/07/2021

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Yue Wu, Shuangfei Zhai, Nitish Srivastava, Josh Susskind, Jian Zhang, Russ Salakhutdinov, Hanlin Goh

Comments

Similar Papers

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

Yiqin Yang, Xiaoteng Ma, Li Chenghao and Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, Qianchuan Zhao

Keywords Abstract Paper

Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision

Johan Björck, Xiangyu Chen, Christopher De Sa and Carla Gomes, Kilian Weinberger

Keywords Abstract Paper

PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration

Yuda Song, Wen Sun

Keywords Abstract Paper

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Abstract Paper

deep learning, optimization, reinforcement learning and planning

Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

Dan Fu, Mayee Chen, Frederic Sala and Sarah Hooper, Kayvon Fatahalian, Christopher Re

Keywords Abstract Paper

Improving Generalization in Reinforcement Learning with Mixture Regularization

KAIXIN WANG, Bingyi Kang, Jie Shao, Jiashi Feng

Keywords Abstract Paper

Curriculum Offline Imitating Learning

Minghuan Liu, Hanye Zhao, Zhengyu Yang and Jian Shen, Weinan Zhang, Li Zhao, Tie-Yan Liu

Keywords Abstract Paper

FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Lanqing Li, Rui Yang, Dijun Luo

Keywords Abstract Paper

distance metric learning, offline/batch reinforcement learning, meta-reinforcement learning, contrastive learning, multi-task reinforcement learning

Offline Meta-Reinforcement Learning with Advantage Weighting

Eric Mitchell, Rafael Rafailov, Xue Bin Peng and Sergey Levine, Chelsea Finn

Keywords Abstract Paper

Algorithms, Multitask, Transfer, and Meta Learning

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

Keywords Abstract Paper

deep learning, reinforcement learning and planning

Boosting Offline Reinforcement Learning with Residual Generative Modeling

Hua Wei, Deheng Ye, Zhao Liu and Hao Wu, Bo Yuan, Qiang Fu, Wei Yang, Zhenhui Li

Keywords Abstract Paper

Machine Learning Applications, Applications of Reinforcement Learning, Game Playing, Reinforcement Learning

IQ-Learn: Inverse soft-Q Learning for Imitation

Divyansh Garg, Shuvam Chakraborty, Chris Cundy and Jiaming Song, Stefano Ermon

Keywords Abstract Paper

optimization, reinforcement learning and planning, adversarial robustness and security

On the generalization properties of adversarial training

Yue Xing, Qifan Song, Guang Cheng

Keywords Abstract Paper

Learning to Reach Goals via Iterated Supervised Learning

Dibya Ghosh, Abhishek Gupta, Ashwin D Reddy and Justin Fu, Coline M Devin, Ben Eysenbach, Sergey Levine

Keywords Abstract Paper

goal reaching, reinforcement learning, goal-conditioned RL, behavior cloning

Goal-Aware Prediction: Learning to Model What Matters

Suraj Nair, Silvio Savarese, Chelsea Finn

Keywords Abstract Paper

MOPO: Model-based Offline Policy Optimization

Tianhe (Kevin) Yu, Garrett Thomas, Lantao Yu and Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Keywords Abstract Paper

Learning to Learn Single Domain Generalization

Fengchun Qiao, Long Zhao, Xi Peng

Keywords Abstract Paper

single domain generalization, out-of-distribution generalization, meta-learning, adversarial training

Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

Yijie Guo, Jongwook Choi, Marcin Moczulski and Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee

Keywords Abstract Paper

Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation

Nathan Kallus, Masatoshi Uehara

Keywords Abstract Paper

Self-Adaptive Training: beyond Empirical Risk Minimization

Lang Huang, Chao Zhang, Hongyang Zhang

Keywords Abstract Paper

Deep Learning -> Generative Models, Algorithms -> Semi-Supervised Learning

Finite-sample regret bound for distributionally robust offline tabular reinforcement learning

Zhengqing Zhou, Zhengyuan Zhou, Qinxun Bai and Linhai Qiu, Jose Blanchet, Peter Glynn

Keywords Abstract Paper

Uniform Sampling over Episode Difficulty

Sébastien Arnold, Guneet Dhillon, Avinash Ravichandran, Stefano Soatto

Keywords Abstract Paper

meta learning, few shot learning

Yiqin Yang, Xiaoteng Ma, Li Chenghao and
Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, Qianchuan Zhao

Keywords Paper

Johan Björck, Xiangyu Chen, Christopher De Sa and
Carla Gomes, Kilian Weinberger

Keywords Paper

Keywords Paper

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

Dan Fu, Mayee Chen, Frederic Sala and
Sarah Hooper, Kayvon Fatahalian, Christopher Re

Keywords Paper

Keywords Paper

Minghuan Liu, Hanye Zhao, Zhengyu Yang and
Jian Shen, Weinan Zhang, Li Zhao, Tie-Yan Liu

Keywords Paper

Keywords Paper

Eric Mitchell, Rafael Rafailov, Xue Bin Peng and
Sergey Levine, Chelsea Finn

Keywords Paper

Keywords Paper

Hua Wei, Deheng Ye, Zhao Liu and
Hao Wu, Bo Yuan, Qiang Fu, Wei Yang, Zhenhui Li

Keywords Paper

Divyansh Garg, Shuvam Chakraborty, Chris Cundy and
Jiaming Song, Stefano Ermon

Keywords Paper

Keywords Paper

Dibya Ghosh, Abhishek Gupta, Ashwin D Reddy and
Justin Fu, Coline M Devin, Ben Eysenbach, Sergey Levine

Keywords Paper

Keywords Paper

Tianhe (Kevin) Yu, Garrett Thomas, Lantao Yu and
Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Keywords Paper

Keywords Paper

Yijie Guo, Jongwook Choi, Marcin Moczulski and
Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee

Keywords Paper

Keywords Paper

Keywords Paper

Zhengqing Zhou, Zhengyuan Zhou, Qinxun Bai and
Linhai Qiu, Jose Blanchet, Peter Glynn

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yi Xu, Lei Shang, Jinxing Ye and
Qi Qian, Yufeng Li, Baigui Sun, Hao Li, rong jin

Keywords Paper

Huiping Zhuang, Zhenyu Weng, Fulin Luo and
Kar-Ann Toh, Haizhou Li, Zhiping Lin

Keywords Paper

Tim G. J. Rudner, Cong Lu, Michael A Osborne and
Yarin Gal, Yee Teh

Keywords Paper

Keywords Paper

Keywords Paper

Davide Abati, Jakub Tomczak, Tijmen Blankevoort and
Simone Calderara, Rita Cucchiara, Babak Ehteshami Bejnordi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tianhe Yu, Aviral Kumar, Yevgen Chebotar and
Karol Hausman, Sergey Levine, Chelsea Finn

Keywords Paper

Botao Hao, Nevena Lazic, Yasin Abbasi-Yadkori and
Pooria Joulani, Csaba Szepesvari

Keywords Paper