Positive-Unlabeled Reward Learning

16/11/2020

Positive-Unlabeled Reward Learning

Danfei Xu, Misha Denil

Keywords:

Abstract Paper Similar Papers

Abstract: Learning reward functions from data is a promising path towards achieving scalable Reinforcement Learning (RL) for robotics. However, a major challenge in training agents from learned reward models is that the agent can learn to exploit errors in the reward model to achieve high reward behaviors that do not correspond to the intended task. These reward delusions can lead to unintended and even dangerous behaviors. On the other hand, adversarial imitation learning frameworks (Ho et al., 2016) tend to suffer the opposite problem, where the discriminator learns to trivially distinguish agent and expert behavior, resulting in reward models that produce low reward signal regardless of the input state. In this paper, we connect these two classes of reward learning methods to positive-unlabeled (PU) learning, and show that by applying a large-scale PU learning algorithm to the reward learning problem, we can address both the reward under- and over-estimation problems simultaneously. Our approach drastically improves both GAIL and supervised reward learning, without any additional assumptions.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at CoRL 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

19/08/2021

Reward-Constrained Behavior Cloning

Zhaorong Wang, Meng Wang, Jingqi Zhang and
Yingfeng Chen, Chongjie Zhang

Keywords Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning, Constraint Optimization

0

0

0

0

14:43

26/04/2020

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

Yuping Luo, Huazhe Xu, Tengyu Ma

Keywords Paper

imitation learning, model-based imitation learning, model-based RL, behavior cloning, covariate shift

0

0

0

0

4:38

26/04/2020

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

Siddharth Reddy, Anca D. Dragan, Sergey Levine

Keywords Paper

Imitation Learning, Reinforcement Learning

0

0

0

0

4:38

06/12/2020

Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization

Sreejith Balakrishnan, Quoc Phong Nguyen, Bryan Kian Hsiang Low, Harold Soh

Keywords Paper

0

0

0

0

3:22

12/07/2020

Optimizing Multiagent Cooperation via Policy Evolution and Shared Experiences

Somdeb Majumdar, Shauharda Khadka, Santiago Miret and
Stephen Mcaleer, Kagan Tumer

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

15:53

12/07/2020

Learning Human Objectives by Evaluating Hypothetical Behavior

Siddharth Reddy, Anca Dragan, Sergey Levine and
Shane Legg, Jan Leike

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

10:21

16/11/2020

f-IRL: Inverse Reinforcement Learning via State Marginal Matching

Tianwei Ni, Harshit Sikchi, Yufei Wang and
Tejus Gupta, Lisa Lee, Ben Eysenbach

Keywords Paper

0

0

0

0

5:07

06/12/2021

IQ-Learn: Inverse soft-Q Learning for Imitation

Divyansh Garg, Shuvam Chakraborty, Chris Cundy and
Jiaming Song, Stefano Ermon

Keywords Paper

optimization, reinforcement learning and planning, adversarial robustness and security

0

0

0

0

8:25

06/12/2021

Adversarial Intrinsic Motivation for Reinforcement Learning

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

Keywords Paper

reinforcement learning and planning, generative model

0

0

0

0

13:11

18/07/2021

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Zaynah Javed, Daniel Brown, Satvik Sharma and
Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca Dragan, Ken Goldberg

Keywords Paper

Social Aspects of Machine Learning, AI Safety

0

0

0

1

5:10

19/04/2021

Exploring supervised and unsupervised rewards in machine translation

Julia Ive, Zixu Wang, Marina Fomicheva, Lucia Specia

Keywords Paper

0

0

0

0

10:52

18/07/2021

MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning

Kevin Li, Abhishek Gupta, Ashwin D Reddy and
Vitchyr Pong, Aurick Zhou, Justin Yu, Sergey Levine

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:16

06/12/2021

Visual Adversarial Imitation Learning using Variational Models

Rafael Rafailov, Tianhe Yu, Aravind Rajeswaran, Chelsea Finn

Keywords Paper

theory, reinforcement learning and planning, adversarial robustness and security, representation learning

0

0

0

0

7:25

06/12/2021

On the Theory of Reinforcement Learning with Once-per-Episode Feedback

Niladri Chatterji, Aldo Pacchiano, Peter Bartlett, Michael Jordan

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

15:06

06/12/2020

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Sebastian Curi, Felix Berkenkamp, Andreas Krause

Keywords Paper

0

0

0

0

3:23

26/08/2020

Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Ruiyi Zhang, Changyou Chen, Zhe Gan and
Zheng Wen, Wenlin Wang, Lawrence Carin

Keywords Paper

0

0

0

0

11:18

19/08/2021

Don’t Do What Doesn’t Matter: Intrinsic Motivation with Action Usefulness

Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin

Keywords Paper

Machine Learning, Reinforcement Learning, Deep Reinforcement Learning

0

0

0

0

14:48

02/02/2021

Stable Adversarial Learning under Distributional Shifts

Jiashuo Liu, Zheyan Shen, Peng Cui and
Linjun Zhou, Kun Kuang, Bo Li, Yishi Lin

Keywords Paper

0

0

0

0

14:30

19/08/2021

Hindsight Trust Region Policy Optimization

Hanbo Zhang, Site Bai, Xuguang Lan and
David Hsu, Nanning Zheng

Keywords Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning

0

0

0

0

13:14

03/05/2021

Hierarchical Reinforcement Learning by Discovering Intrinsic Options

Jesse Zhang, Haonan Yu, Wei Xu

Keywords Paper

reinforcement learning, unsupervised skill discovery, exploration, options, hierarchical reinforcement learning

0

0

0

0

4:58

16/11/2020

DORB: Dynamically Optimizing Multiple Rewards with Bandits

Ramakanth Pasunuru, Han Guo, Mohit Bansal

Keywords Paper

language tasks, optimization rewards, nlg tasks, question generation

0

0

0

0

11:34

03/05/2021

Learning What To Do by Simulating the Past

David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan

Keywords Paper

reinforcement learning, imitation learning, reward learning

0

0

0

0

5:09

02/02/2021

Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework

Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li

Keywords Paper

0

0

0

0

16:03

03/05/2021

Mutual Information State Intrinsic Control

Rui Zhao, Yang Gao, Pieter Abbeel and
Volker Tresp, Wei Xu

Keywords Paper

Intrinsic Motivation, Intrinsic Reward, Intrinsically Motivated Reinforcement Learning, Deep Reinforcement Learning, Reinforcement Learning

0

0

0

0

9:55

02/02/2021

Reinforcement Learning Based Multi-Agent Resilient Control: From Deep Neural Networks to an Adaptive Law

Jian Hou, Fangyuan Wang, Lili Wang, Zhiyong Chen

Keywords Paper

0

0

0

0

15:48

18/07/2021

Monotonic Robust Policy Optimization with Model Discrepancy

yuankun jiang, Chenglin Li, Wenrui Dai and
Junni Zou, Hongkai Xiong

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:17

03/05/2021

Learning to Reach Goals via Iterated Supervised Learning

Dibya Ghosh, Abhishek Gupta, Ashwin D Reddy and
Justin Fu, Coline M Devin, Ben Eysenbach, Sergey Levine

Keywords Paper

goal reaching, reinforcement learning, goal-conditioned RL, behavior cloning

0

0

0

0

15:19

06/12/2021

Information Directed Reward Learning for Reinforcement Learning

David Lindner, Matteo Turchetta, Sebastian Tschiatschek and
Kamil Ciosek, Andreas Krause

Keywords Paper

reinforcement learning and planning, active learning

0

0

0

0

11:47

06/12/2020

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Yujing Hu, Weixun Wang, Hangtian Jia and
Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Keywords Paper

0

0

0

0

3:20

14/09/2020

Active deep Q-learning with demonstration

Si-An Chen,Hsuan-Tien Lin, Voot Tangkaratt, Masashi Sugiyam

Keywords Paper

0

0

0

0

13:42

02/02/2021

Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation

Junhong Shen, Lin F. Yang

Keywords Paper

0

0

0

0

19:12

06/12/2021

Robust Deep Reinforcement Learning through Adversarial Loss

Tuomas Oikarinen, Wang Zhang, Alexandre Megretski and
Luca Daniel, Tsui-Wei Weng

Keywords Paper

reinforcement learning and planning, robustness, adversarial robustness and security

0

0

0

0

14:15

12/07/2020

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

Daniel Brown, Scott Niekum, Russell Coleman, Ravi Srinivasan

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

15:11

26/04/2020

Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery

Kristian Hartikainen, Xinyang Geng, Tuomas Haarnoja, Sergey Levine

Keywords Paper

reinforcement learning, semi-supervised learning, unsupervised learning, robotics, deep learning

0

0

0

0

5:07

02/02/2021

Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Haiyan Yin, Jianda Chen, Sinno Jialin Pan, Sebastian Tschiatschek

Keywords Paper

0

0

0

0

14:40

26/04/2020

Intrinsic Motivation for Encouraging Synergistic Behavior

Rohan Chitnis, Shubham Tulsiani, Saurabh Gupta, Abhinav Gupta

Keywords Paper

reinforcement learning, intrinsic motivation, synergistic, robot manipulation

0

0

0

0

5:02

06/12/2021

Learning One Representation to Optimize All Rewards

Ahmed Touati, Yann Ollivier

Keywords Paper

deep learning, reinforcement learning and planning, representation learning

0

0

0

0

14:52

02/02/2021

Advice-Guided Reinforcement Learning in a non-Markovian Environment

Daniel Neider, Jean-Raphael Gaglione, Ivan Gavran and
Ufuk Topcu, Bo Wu, Zhe Xu

Keywords Paper

0

0

0

0

18:07

06/12/2021

Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies

Tim Seyde, Igor Gilitschenski, Wilko Schwarting and
Bartolomeo Stellato, Martin Riedmiller, Markus Wulfmeier, Daniela Rus

Keywords Paper

reinforcement learning and planning

0

0

0

0

6:48

02/02/2021

GaussianPath:A Bayesian Multi-Hop Reasoning Framework for Knowledge Graph Reasoning

Guojia Wan, Bo Du

Keywords Paper

0

0

0

0

13:52