12/07/2020

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

Daniel Brown, Scott Niekum, Russell Coleman, Ravi Srinivasan

Keywords: Reinforcement Learning - Deep RL

Abstract: Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing imitation learning. However, Bayesian reward learning methods are typically computationally intractable for complex control problems. We propose a highly efficient Bayesian reward learning algorithm that scales to high-dimensional imitation learning problems by first pre-training a low-dimensional feature encoding via self-supervised tasks and then leveraging preferences over demonstrations to perform fast Bayesian inference via sampling. We evaluate our proposed approach on the task of learning to play Atari games from demonstrations, without access to the game score, and achieve state-of-the-art imitation learning performance. Furthermore, we also demonstrate that our approach enables efficient high-confidence performance bounds for any evaluation policy. We show that these high-confidence performance bounds can be used to accurately rank the performance and risk of a variety of different evaluation policies, despite not having samples of the true reward function.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers