Learning Task-Distribution Reward Shaping with Meta-Learning

Abstract: Reward shaping is one of the most effective methods to tackle the crucial yet challenging problem of credit assignment and accelerate Reinforcement Learning. However, designing shaping functions usually requires rich expert knowledge and hand-engineering, and the difficulties are further exacerbated given multiple tasks to solve. In this paper, we consider reward shaping on a distribution of tasks that share state spaces but not necessarily action spaces. We provide insights into optimal reward shaping, and propose a novel meta-learning framework to automatically learn such reward shaping to apply on newly sampled tasks. Theoretical analysis and extensive experiments establish us as the state-of-the-art in learning task-distribution reward shaping, outperforming previous such works (Konidaris and Barto 2006; Snel and Whiteson 2014). We further show that our method outperforms learning intrinsic rewards (Yang et al. 2019; Zheng et al. 2020), outperforms Rainbow (Hessel et al. 2018) in complex pixel-based CoinRun games, and is also better than hand-designed reward shaping on grids. While the goal of this paper is to learn reward shaping rather than to propose new general meta-learning algorithms as PEARL (Rakelly et al. 2019) or MQL (Fakoor et al. 2020), our framework based on MAML (Finn, Abbeel, and Levine 2017) also outperforms PEARL / MQL, and could combine with them for further improvement.

18/07/2021

Learning Task-Distribution Reward Shaping with Meta-Learning

Haosheng Zou, Tongzheng Ren, Dong Yan, Hang Su, Jun Zhu

Comments

Similar Papers

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Zaynah Javed, Daniel Brown, Satvik Sharma and Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca Dragan, Ken Goldberg

Keywords Abstract Paper

Social Aspects of Machine Learning, AI Safety

DORB: Dynamically Optimizing Multiple Rewards with Bandits

Ramakanth Pasunuru, Han Guo, Mohit Bansal

Keywords Abstract Paper

language tasks, optimization rewards, nlg tasks, question generation

Reward-Constrained Behavior Cloning

Zhaorong Wang, Meng Wang, Jingqi Zhang and Yingfeng Chen, Chongjie Zhang

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning, Constraint Optimization

Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization

Jianhao Wang, Zhizhou Ren, Beining Han and Jianing Ye, Chongjie Zhang

Keywords Abstract Paper

theory, reinforcement learning and planning

Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Haiyan Yin, Jianda Chen, Sinno Jialin Pan, Sebastian Tschiatschek

Keywords Abstract Paper

Average-Reward Reinforcement Learning with Trust Region Methods

Xiaoteng Ma, Xiaohang Tang, Li Xia and Jun Yang, Qianchuan Zhao

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning, Markov Decision Processes

MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning

Kevin Li, Abhishek Gupta, Ashwin D Reddy and Vitchyr Pong, Aurick Zhou, Justin Yu, Sergey Levine

Keywords Abstract Paper

Reinforcement Learning and Planning, Deep RL

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Yujing Hu, Weixun Wang, Hangtian Jia and Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Keywords Abstract Paper

Modeling and Optimization Trade-off in Meta-learning

Katelyn Gao, Ozan Sener

Keywords Abstract Paper

Positive-Unlabeled Reward Learning

Danfei Xu, Misha Denil

Keywords Abstract Paper

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

Dylan J Foster, Akshay Krishnamurthy

Keywords Abstract Paper

theory, reinforcement learning and planning, bandits, online learning

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang

Keywords Abstract Paper

Hindsight Trust Region Policy Optimization

Hanbo Zhang, Site Bai, Xuguang Lan and David Hsu, Nanning Zheng

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

Ben Eysenbach, Sergey Levine, Russ Salakhutdinov

Keywords Abstract Paper

reinforcement learning and planning, machine learning

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

Andrea Zanette, Alessandro Lazaric, Mykel J Kochenderfer, Emma Brunskill

Keywords Abstract Paper

Goal-directed Generation of Discrete Structures with Conditional Generative Models

Amina Mollaysa, Brooks Paige, Alexandros Kalousis

Keywords Abstract Paper

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

Charles Packer, Pieter Abbeel, Joseph Gonzalez

Keywords Abstract Paper

reinforcement learning and planning

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Jin Zhang, Jianhao Wang, Hao Hu and Tong Chen, Yingfeng Chen, Changjie Fan, Chongjie Zhang

Keywords Abstract Paper

Algorithms, Multitask, Transfer, and Meta Learning

Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation

Haoxiang Wang, Han Zhao, Bo Li

Keywords Abstract Paper

Algorithms, Multitask, Transfer, and Meta Learning

State-only Imitation with Transition Dynamics Mismatch

Tanmay Gangwani, Jian Peng

Keywords Abstract Paper

Imitation learning, Reinforcement Learning, Inverse Reinforcement Learning

Learning Fair Policies in Multi-Objective (Deep) Reinforcement Learning with Average and Discounted Rewards

Umer Siddique, Paul Weng, Matthieu Zimmer

Keywords Abstract Paper

Zaynah Javed, Daniel Brown, Satvik Sharma and
Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca Dragan, Ken Goldberg

Keywords Paper

Keywords Paper

Zhaorong Wang, Meng Wang, Jingqi Zhang and
Yingfeng Chen, Chongjie Zhang

Keywords Paper

Jianhao Wang, Zhizhou Ren, Beining Han and
Jianing Ye, Chongjie Zhang

Keywords Paper

Keywords Paper

Xiaoteng Ma, Xiaohang Tang, Li Xia and
Jun Yang, Qianchuan Zhao

Keywords Paper

Kevin Li, Abhishek Gupta, Ashwin D Reddy and
Vitchyr Pong, Aurick Zhou, Justin Yu, Sergey Levine

Keywords Paper

Yujing Hu, Weixun Wang, Hangtian Jia and
Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Hanbo Zhang, Site Bai, Xuguang Lan and
David Hsu, Nanning Zheng

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jin Zhang, Jianhao Wang, Hao Hu and
Tong Chen, Yingfeng Chen, Changjie Fan, Chongjie Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sean Sinclair, Tianyu Wang, Gauri Jain and
Sid Banerjee, Christina Yu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tianhe (Kevin) Yu, Garrett Thomas, Lantao Yu and
Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zheng Wen, Doina Precup, Morteza Ibrahimi and
Andre Barreto, Benjamin Van Roy, Satinder Singh

Keywords Paper