Trust the Model When It Is Confident: Masked Model-based Actor-Critic

Abstract: It is a popular belief that model-based Reinforcement Learning (RL) is more sample efficient than model-free RL, but in practice, it is not always true due to overweighed model errors. In complex and noisy settings, model-based RL tends to have trouble using the model if it does not know when to trust the model. In this work, we find that better model usage can make a huge difference. We show theoretically that if the use of model-generated data is restricted to state-action pairs where the model error is small, the performance gap between model and real rollouts can be reduced. It motivates us to use model rollouts only when the model is confident about its predictions. We propose Masked Model-based Actor-Critic (M2AC), a novel policy optimization algorithm that maximizes a model-based lower-bound of the true value function. M2AC implements a masking mechanism based on the model's uncertainty estimation to decide whether the model should be used or not. Consequently, the new algorithm tends to give robust policy improvements. Experiments on continuous control benchmarks demonstrate that M2AC has strong performance even when using long model rollouts in very noisy environments, and significantly outperforms previous state-of-the-art methods.

06/12/2020

Trust the Model When It Is Confident: Masked Model-based Actor-Critic

Feiyang Pan, Jia He, Dandan Tu, Qing He

Comments

Similar Papers

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Keywords Abstract Paper

Practical and Rigorous Uncertainty Bounds for Gaussian Process Regression

Christian Fiedler, Carsten W. Scherer, Sebastian Trimpe

Keywords Abstract Paper

COMBO: Conservative Offline Model-Based Policy Optimization

Tianhe Yu, Aviral Kumar, Rafael Rafailov and Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Abstract Paper

deep learning, optimization, reinforcement learning and planning

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Sebastian Curi, Felix Berkenkamp, Andreas Krause

Keywords Abstract Paper

Perturbation Based Learning for Structured NLP tasks with Application to Dependency Parsing

Amichay Doitch, Ram Yazdi, Tamir Hazan, Roi Reichart

Keywords Abstract Paper

Structured tasks, Dependency Parsing, NLP, sampling

Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

Max Ryabinin, Andrey Malinin, Mark Gales

Keywords Abstract Paper

machine learning

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

Keywords Abstract Paper

deep learning, reinforcement learning and planning

Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks

Sungryull Sohn, Sungtae Lee, Jongwook Choi and Harm van Seijen, Mehdi Fatemi, Honglak Lee

Keywords Abstract Paper

Reinforcement Learning and Planning, Deep RL

Bayes Consistency vs. H-Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class

Mingyuan Zhang, Shivani Agarwal

Keywords Abstract Paper

Bidirectional Model-based Policy Optimization

Hang Lai, Jian Shen, Weinan Zhang, Yong Yu

Keywords Abstract Paper

Reinforcement Learning - Deep RL

Modeling the Second Player in Distributionally Robust Optimization

Paul Michel, Tatsunori Hashimoto, Graham Neubig

Keywords Abstract Paper

adversarial learning, deep learning, robustness, distributionally robust optimization

Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

Yijie Guo, Jongwook Choi, Marcin Moczulski and Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee

Keywords Abstract Paper

MOPO: Model-based Offline Policy Optimization

Tianhe (Kevin) Yu, Garrett Thomas, Lantao Yu and Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Keywords Abstract Paper

GaussianPath:A Bayesian Multi-Hop Reasoning Framework for Knowledge Graph Reasoning

Guojia Wan, Bo Du

Keywords Abstract Paper

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations

Huan Zhang, Hongge Chen, Chaowei Xiao and Bo Li, Mingyan Liu, Duane Boning, Cho-Jui Hsieh

Keywords Abstract Paper

Fundamental Tradeoffs in Distributionally Adversarial Training

Mohammad Mehrabi, Adel Javanmard, Ryan A. Rossi and Anup Rao, Tung Mai

Keywords Abstract Paper

Theory

Tactical Optimism and Pessimism for Deep Reinforcement Learning

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and Michael Arbel, Michael Jordan

Keywords Abstract Paper

reinforcement learning and planning, bandits

Tightening Exploration in Upper Confidence Reinforcement Learning

Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

Keywords Abstract Paper

Reinforcement Learning - General

Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models

Joan Serrà, David Álvarez, Vicenç Gómez and Olga Slizovskaia, José F. Núñez, Jordi Luque

Keywords Abstract Paper

OOD, generative models, likelihood

A contraction approach to model-based reinforcement learning

Ting-Han Fan, Peter Ramadge

Keywords Abstract Paper

Learning Value Functions in Deep Policy Gradients using Residual Variance

Yannis Flet-Berliac, reda ouhamma, odalric-ambrym maillard, philippe preux

Keywords Abstract Paper

Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning

Sebastian Curi, Ilija Bogunovic, Andreas Krause

Keywords Paper

Keywords Paper

Tianhe Yu, Aviral Kumar, Rafael Rafailov and
Aravind Rajeswaran, Sergey Levine, Chelsea Finn

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sungryull Sohn, Sungtae Lee, Jongwook Choi and
Harm van Seijen, Mehdi Fatemi, Honglak Lee

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yijie Guo, Jongwook Choi, Marcin Moczulski and
Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee

Keywords Paper

Tianhe (Kevin) Yu, Garrett Thomas, Lantao Yu and
Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Keywords Paper

Keywords Paper

Huan Zhang, Hongge Chen, Chaowei Xiao and
Bo Li, Mingyan Liu, Duane Boning, Cho-Jui Hsieh

Keywords Paper

Mohammad Mehrabi, Adel Javanmard, Ryan A. Rossi and
Anup Rao, Tung Mai

Keywords Paper

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and
Michael Arbel, Michael Jordan

Keywords Paper

Keywords Paper

Joan Serrà, David Álvarez, Vicenç Gómez and
Olga Slizovskaia, José F. Núñez, Jordi Luque

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Shen Kai, Lingfei Wu, Siliang Tang and
Yueting Zhuang, zhen he, Zhuoye Ding, Yun Xiao, Bo Long

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar and
Weihao Kong, Emma Brunskill

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper