Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

Abstract: Imitation learning, followed by reinforcement learning algorithms, is a promising paradigm to solve complex control tasks sample-efficiently. However, learning from demonstrations often suffers from the covariate shift problem, which results in cascading errors of the learned policy. We introduce a notion of conservatively extrapolated value functions, which provably lead to policies with self-correction. We design an algorithm Value Iteration with Negative Sampling (VINS) that practically learns such value functions with conservative extrapolation. We show that VINS can correct mistakes of the behavioral cloning policy on simulated robotics benchmark tasks. We also propose the algorithm of using VINS to initialize a reinforcement learning algorithm, which is shown to outperform prior works in sample efficiency.

16/11/2020

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

Yuping Luo, Huazhe Xu, Tengyu Ma

Comments

Similar Papers

Positive-Unlabeled Reward Learning

Danfei Xu, Misha Denil

Keywords Abstract Paper

Learning to Reach Goals via Iterated Supervised Learning

Dibya Ghosh, Abhishek Gupta, Ashwin D Reddy and Justin Fu, Coline M Devin, Ben Eysenbach, Sergey Levine

Keywords Abstract Paper

goal reaching, reinforcement learning, goal-conditioned RL, behavior cloning

Adversarial Intrinsic Motivation for Reinforcement Learning

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

Keywords Abstract Paper

reinforcement learning and planning, generative model

Bridging the Imitation Gap by Adaptive Insubordination

Luca Weihs, Unnat Jain, Iou-Jen Liu and Jordi Salvador, Svetlana Lazebnik, Aniruddha Kembhavi, Alex Schwing

Keywords Abstract Paper

reinforcement learning and planning

Monotonic Robust Policy Optimization with Model Discrepancy

yuankun jiang, Chenglin Li, Wenrui Dai and Junni Zou, Hongkai Xiong

Keywords Abstract Paper

Reinforcement Learning and Planning

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Aviral Kumar, Abhishek Gupta, Sergey Levine

Keywords Abstract Paper

Modeling and Optimization Trade-off in Meta-learning

Katelyn Gao, Ozan Sener

Keywords Abstract Paper

Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation

Junhong Shen, Lin F. Yang

Keywords Abstract Paper

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

Siddharth Reddy, Anca D. Dragan, Sergey Levine

Keywords Abstract Paper

Imitation Learning, Reinforcement Learning

Imitation Learning via Off-Policy Distribution Matching

Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

Keywords Abstract Paper

reinforcement learning, deep learning, imitation learning, adversarial learning

Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization

Paul Barde, Julien Roy, Wonseok Jeon and Joelle Pineau, Chris Pal, Derek Nowrouzezahrai

Keywords Abstract Paper

Group Fairness by Probabilistic Modeling with Latent Fair Decisions

YooJung Choi, Meihua Dang, Guy Van den Broeck

Keywords Abstract Paper

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Yu-Heng Hung, Ping-Chun Hsieh, Xi Liu, P. R. Kumar

Keywords Abstract Paper

DriftSurf: Stable-State / Reactive-State Learning under Concept Drift

Ashraf Tahmasbi, Ellango Jothimurugesan, Srikanta Tirthapura, Phil Gibbons

Keywords Abstract Paper

Algorithms, Online Learning Algorithms

A contraction approach to model-based reinforcement learning

Ting-Han Fan, Peter Ramadge

Keywords Abstract Paper

Reward-Constrained Behavior Cloning

Zhaorong Wang, Meng Wang, Jingqi Zhang and Yingfeng Chen, Chongjie Zhang

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning, Constraint Optimization

Learning Value Functions in Deep Policy Gradients using Residual Variance

Yannis Flet-Berliac, reda ouhamma, odalric-ambrym maillard, philippe preux

Keywords Abstract Paper

On the Theory of Reinforcement Learning with Once-per-Episode Feedback

Niladri Chatterji, Aldo Pacchiano, Peter Bartlett, Michael Jordan

Keywords Abstract Paper

theory, reinforcement learning and planning

Disagreement-Regularized Imitation Learning

Kiante Brantley, Wen Sun, Mikael Henaff

Keywords Abstract Paper

imitation learning, reinforcement learning, uncertainty

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White

Keywords Abstract Paper

reinforcement learning, bias and variance reduction

Representation Matters: Offline Pretraining for Sequential Decision Making

Mengjiao Yang, Ofir Nachum

Keywords Abstract Paper

Reinforcement Learning and Planning

A Reinforced Generation of Adversarial Examples for Neural Machine Translation

Keywords Paper

Dibya Ghosh, Abhishek Gupta, Ashwin D Reddy and
Justin Fu, Coline M Devin, Ben Eysenbach, Sergey Levine

Keywords Paper

Keywords Paper

Luca Weihs, Unnat Jain, Iou-Jen Liu and
Jordi Salvador, Svetlana Lazebnik, Aniruddha Kembhavi, Alex Schwing

Keywords Paper

yuankun jiang, Chenglin Li, Wenrui Dai and
Junni Zou, Hongkai Xiong

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Paul Barde, Julien Roy, Wonseok Jeon and
Joelle Pineau, Chris Pal, Derek Nowrouzezahrai

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhaorong Wang, Meng Wang, Jingqi Zhang and
Yingfeng Chen, Chongjie Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Wei Zou, Shujian Huang, Jun Xie and
Xinyu Dai, Jiajun Chen

Keywords Paper

Siddharth Desai, Ishan Durugkar, Haresh Karnan and
Garrett Warnell, Josiah Hanna, Peter Stone

Keywords Paper

Junhyun Nam, Hyuntak Cha, Sungsoo Ahn and
Jaeho Lee, Jinwoo Shin

Keywords Paper

Keywords Paper

Ge Liu, Linglan Zhao, Wei Li and
Dashan Guo, Xiangzhong Fang

Keywords Paper

Keywords Paper

Philip Ball, Jack Parker-Holder, Aldo Pacchiano and
Krzysztof Choromanski, Stephen Roberts

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tim Seyde, Igor Gilitschenski, Wilko Schwarting and
Bartolomeo Stellato, Martin Riedmiller, Markus Wulfmeier, Daniela Rus

Keywords Paper

Keywords Paper

Keywords Paper