Joint Inference of Reward Machines and Policies for Reinforcement Learning

Abstract: Incorporating high-level knowledge is an effective way to expedite reinforcement learning (RL), especially for complex tasks with sparse rewards. We investigate an RL problem where the high-level knowledge is in the form of reward machines, i.e., a type of Mealy machine that encodes non- Markovian reward functions. We focus on a setting in which this knowledge is a priori not available to the learning agent. We develop an iterative algorithm that performs joint inference of reward machines and policies for RL (more specifically, q-learning). In each iteration, the algorithm maintains a hypothesis reward machine and a sample of RL episodes. It uses a separate q-function defined for each state of the current hypothesis reward machine to determine the policy and performs RL to update the q-functions. While performing RL, the algorithm updates the sample by adding RL episodes along which the obtained rewards are inconsistent with the rewards based on the current hypothesis reward machine. In the next iteration, the algorithm infers a new hypothesis reward machine from the updated sample. Based on an equivalence relationship we defined between states of reward machines, we transfer the q-functions between the hypothesis reward machines in consecutive iterations. We prove that the proposed algorithm converges almost surely to an optimal policy in the limit. The experiments show that learning high-level knowledge in the form of reward machines leads to fast convergence to optimal policies in RL, while the baseline RL methods fail to converge to optimal policies after a substantial number of training steps.

06/12/2021

Algorithms -> Multitask and Transfer Learning; Algorithms -> Representation Learning; Data, Challenges, Implementations, and So, Applications -> Natural Language Processing

3:15

12/07/2020

Joint Inference of Reward Machines and Policies for Reinforcement Learning

Zhe Xu, Ivan Gavran, Yousef Ahmad, Rupak Majumdar, Daniel Neider, Ufuk Topcu, Bo Wu

Comments

Similar Papers

Information Directed Reward Learning for Reinforcement Learning

David Lindner, Matteo Turchetta, Sebastian Tschiatschek and Kamil Ciosek, Andreas Krause

Keywords Abstract Paper

reinforcement learning and planning, active learning

Multi-task Batch Reinforcement Learning with Metric Learning

Jiachen Li, Quan Vuong, Shuang Liu and Minghua Liu, Kamil Ciosek, Henrik Christensen, Hao Su

Keywords Abstract Paper

Algorithms -> Multitask and Transfer Learning; Algorithms -> Representation Learning; Data, Challenges, Implementations, and So, Applications -> Natural Language Processing

Ready Policy One: World Building Through Active Learning

Philip Ball, Jack Parker-Holder, Aldo Pacchiano and Krzysztof Choromanski, Stephen Roberts

Keywords Abstract Paper

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

Yifang Chen, Simon Du, Kevin Jamieson

Keywords Abstract Paper

, Optimization, Non-Convex Optimization, Theory, Online Learning Theory

On-Policy Deep Reinforcement Learning for the Average-Reward Criterion

Yiming Zhang, Keith Ross

Keywords Abstract Paper

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Yujing Hu, Weixun Wang, Hangtian Jia and Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Keywords Abstract Paper

Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks

Yuqian Jiang, Suda Bharadwaj, Bo Wu and Rishi Shah, Ufuk Topcu, Peter Stone

Keywords Abstract Paper

A state aggregation approach for solving knapsack problem with deep reinforcement learning

Reza Refaei Afshar, Yingqian Zhang, Murat Firat, Uzay Kaymak

Keywords Abstract Paper

Deep Inverse Q-learning with Constraints

Gabriel Kalweit, Maria Huegle, Moritz Werling, Joschka Boedecker

Keywords Abstract Paper

Provably Efficient Algorithms for Multi-Objective Competitive RL

Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Explicable Reward Design for Reinforcement Learning Agents

Rati Devidze, Goran Radanovic, Parameswaran Kamalaruban, Adish Singla

Keywords Abstract Paper

optimization, reinforcement learning and planning, interpretability

Sequential Transfer in Reinforcement Learning with a Generative Model

Andrea Tirinzoni, Riccardo Poiani, Marcello Restelli

Keywords Abstract Paper

Optimizing Multiagent Cooperation via Policy Evolution and Shared Experiences

Somdeb Majumdar, Shauharda Khadka, Santiago Miret and Stephen Mcaleer, Kagan Tumer

Keywords Abstract Paper

GaussianPath:A Bayesian Multi-Hop Reasoning Framework for Knowledge Graph Reasoning

Guojia Wan, Bo Du

Keywords Abstract Paper

DORB: Dynamically Optimizing Multiple Rewards with Bandits

Ramakanth Pasunuru, Han Guo, Mohit Bansal

Keywords Abstract Paper

language tasks, optimization rewards, nlg tasks, question generation

Adversarial Intrinsic Motivation for Reinforcement Learning

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

Keywords Abstract Paper

reinforcement learning and planning, generative model

Multi-Step Greedy Reinforcement Learning Algorithms

Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh

Keywords Abstract Paper

Bayesian Distributional Policy Gradients

Luchen Li, A. Aldo Faisal

Keywords Abstract Paper

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement

Benjamin Eysenbach, XINYANG GENG, Sergey Levine, Russ Salakhutdinov

Keywords Abstract Paper

Optimization -> Non-Convex Optimization, Theory -> Statistical Physics of Learning

Regret Minimization Experience Replay in Off-Policy Reinforcement Learning

Xu-Hui Liu, Zhenghai Xue, Jingcheng Pang and Shengyi Jiang, Feng Xu, Yang Yu

Keywords Abstract Paper

theory, reinforcement learning and planning

Reward is enough for convex MDPs

Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh

Keywords Abstract Paper

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Abstract Paper

Nested-Wasserstein Self-Imitation Learning for Sequence Generation

David Lindner, Matteo Turchetta, Sebastian Tschiatschek and
Kamil Ciosek, Andreas Krause

Keywords Paper

Jiachen Li, Quan Vuong, Shuang Liu and
Minghua Liu, Kamil Ciosek, Henrik Christensen, Hao Su

Keywords Paper

Philip Ball, Jack Parker-Holder, Aldo Pacchiano and
Krzysztof Choromanski, Stephen Roberts

Keywords Paper

Keywords Paper

Keywords Paper

Yujing Hu, Weixun Wang, Hangtian Jia and
Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Keywords Paper

Yuqian Jiang, Suda Bharadwaj, Bo Wu and
Rishi Shah, Ufuk Topcu, Peter Stone

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Somdeb Majumdar, Shauharda Khadka, Santiago Miret and
Stephen Mcaleer, Kagan Tumer

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Xu-Hui Liu, Zhenghai Xue, Jingcheng Pang and
Shengyi Jiang, Feng Xu, Yang Yu

Keywords Paper

Keywords Paper

Keywords Paper

Ruiyi Zhang, Changyou Chen, Zhe Gan and
Zheng Wen, Wenlin Wang, Lawrence Carin

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Yaodong Yang, Jianye Hao, Guangyong Chen and
Hongyao Tang, Yingfeng Chen, Yujing Hu, Changjie Fan, Zhongyu Wei

Keywords Paper

Keywords Paper

Keywords Paper

Majid Abdolshah, Hung Le, Thommen Karimpanal George and
Sunil Gupta, Santu Rana, Svetha Venkatesh

Keywords Paper

Gen Li, Changxiao Cai, Yuxin Chen and
Yuantao Gu, Yuting Wei, Yuejie Chi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper