Reinforcement Learning with Trajectory Feedback

02/02/2021

Reinforcement Learning with Trajectory Feedback

Yonathan Efroni, Nadav Merlis, Shie Mannor

Keywords:

Abstract Paper Similar Papers

Abstract: The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair. However, in practice, it is often the case that such frequent feedback is not available. In this work, we take a first step towards relaxing this assumption and require a weaker form of feedback, which we refer to as \emph{trajectory feedback}. Instead of observing the reward obtained after every action, we assume we only receive a score that represents the quality of the whole trajectory observed by the agent, namely, the sum of all rewards obtained over this trajectory. We extend reinforcement learning algorithms to this setting, based on least-squares estimation of the unknown reward, for both the known and unknown transition model cases, and study the performance of these algorithms by analyzing their regret. For cases where the transition model is unknown, we offer a hybrid optimistic-Thompson Sampling approach that results in a tractable algorithm.

The video of this talk cannot be embedded. You can watch it here:

https://slideslive.com/38948557

(Link will open in new window)

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AAAI 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/02/2021

Advice-Guided Reinforcement Learning in a non-Markovian Environment

Daniel Neider, Jean-Raphael Gaglione, Ivan Gavran and
Ufuk Topcu, Bo Wu, Zhe Xu

Keywords Paper

0

0

0

0

18:07

06/12/2021

Learning in Non-Cooperative Configurable Markov Decision Processes

Giorgia Ramponi, Alberto Maria Metelli, Alessandro Concetti, Marcello Restelli

Keywords Paper

reinforcement learning and planning, online learning

0

0

0

0

14:14

06/12/2021

Information Directed Reward Learning for Reinforcement Learning

David Lindner, Matteo Turchetta, Sebastian Tschiatschek and
Kamil Ciosek, Andreas Krause

Keywords Paper

reinforcement learning and planning, active learning

0

0

0

0

11:47

12/07/2020

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

13:26

06/12/2021

Learning One Representation to Optimize All Rewards

Ahmed Touati, Yann Ollivier

Keywords Paper

deep learning, reinforcement learning and planning, representation learning

0

0

0

0

14:52

06/12/2021

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

14:22

18/07/2021

Model-based Reinforcement Learning for Continuous Control with Posterior Sampling

Ying Fan, Yifei Ming

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

18:34

06/12/2021

Adversarial Intrinsic Motivation for Reinforcement Learning

Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

Keywords Paper

reinforcement learning and planning, generative model

0

0

0

0

13:11

19/04/2021

Exploring supervised and unsupervised rewards in machine translation

Julia Ive, Zixu Wang, Marina Fomicheva, Lucia Specia

Keywords Paper

0

0

0

0

10:52

12/07/2020

Tightening Exploration in Upper Confidence Reinforcement Learning

Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

Keywords Paper

Reinforcement Learning - General

0

0

0

0

16:14

03/08/2020

Dueling Posterior Sampling for Preference-Based Reinforcement Learning

Ellen Novoseller, Yibing Wei, Yanan Sui and
Yisong Yue, Joel Burdick

Keywords Paper

0

0

0

0

7:57

02/02/2021

Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action Tasks

Haobo Jiang, Jin Xie, Jian Yang

Keywords Paper

0

0

0

0

13:27

06/12/2020

Dynamic Regret of Policy Optimization in Non-Stationary Environments

Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

Keywords Paper

0

0

0

0

2:41

04/08/2021

Corruption-robust exploration in episodic reinforcement learning

Thodoris Lykouris, Max Simchowitz, Alex Slivkins, Wen Sun

Keywords Paper

0

0

0

0

18:27

12/07/2020

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

Gellért Weisz, Tor Lattimore, Csaba Szepesvari

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

15:20

06/12/2020

On Reward-Free Reinforcement Learning with Linear Function Approximation

Ruosong Wang, Simon Du, Lin Yang, Russ Salakhutdinov

Keywords Paper

0

0

0

0

3:12

06/12/2021

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Keywords Paper

reinforcement learning and planning

0

0

0

0

13:24

06/12/2020

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Yujing Hu, Weixun Wang, Hangtian Jia and
Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Keywords Paper

0

0

0

0

3:20

26/04/2020

Ranking Policy Gradient

Kaixiang Lin, Jiayu Zhou

Keywords Paper

Sample-efficient reinforcement learning, off-policy learning.

0

0

0

0

5:43

06/12/2021

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris and
Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

Keywords Paper

bandits

0

0

0

0

12:07

18/07/2021

Provably Efficient Learning of Transferable Rewards

Alberto Maria Metelli, Giorgia Ramponi, Alessandro Concetti, Marcello Restelli

Keywords Paper

Optimization, Convex Optimization, Reinforcement Learning and Planning, Optimization, Combinatorial Optimization

0

0

0

0

5:14

18/07/2021

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

Yifang Chen, Simon Du, Kevin Jamieson

Keywords Paper

, Optimization, Non-Convex Optimization, Theory, Online Learning Theory

0

0

0

0

5:20

06/12/2020

Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions

Matthew Faw, Rajat Sen, Karthikeyan Shanmugam and
Constantine Caramanis, Sanjay Shakkottai

Keywords Paper

0

0

0

0

3:24

25/07/2020

Sampler design for implicit feedback data by noisy-label robust learning

Wenhui Yu, Zheng Qin

Keywords Paper

collaborative filtering, bayesian point-wise optimization, noisy-label robust learning, negative sampling, item recommendation

0

0

0

0

12:25

18/07/2021

Counterfactual Credit Assignment in Model-Free Reinforcement Learning

Thomas Mesnard, Theo Weber, Fabio Viola and
Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Thomas Stepleton, Nicolas Heess, Arthur Guez, Eric Moulines, Marcus Hutter, Lars Buesing, Remi Munos

Keywords Paper

Reinforcement Learning and Planning, Deep RL

0

0

0

0

5:58

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

26/04/2020

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

Siddharth Reddy, Anca D. Dragan, Sergey Levine

Keywords Paper

Imitation Learning, Reinforcement Learning

0

0

0

0

4:38

06/12/2020

Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition

Zihan Zhang, Yuan Zhou, Xiangyang Ji

Keywords Paper

0

0

0

0

3:11

13/04/2021

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

0

0

0

0

3:07

18/07/2021

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Dongruo Zhou, Jiafan He, Quanquan Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:20

02/02/2021

A One-Size-Fits-All Solution to Conservative Bandit Problems

Yihan Du, Siwei Wang, Longbo Huang

Keywords Paper

0

0

0

0

14:13

04/08/2021

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Paper

0

0

0

0

20:29

06/12/2021

On the Theory of Reinforcement Learning with Once-per-Episode Feedback

Niladri Chatterji, Aldo Pacchiano, Peter Bartlett, Michael Jordan

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

15:06

12/07/2020

Learning Near Optimal Policies with Low Inherent Bellman Error

Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

14:22

06/12/2021

Identifiability in inverse reinforcement learning

Haoyang Cao, Samuel Cohen, Lukasz Szpruch

Keywords Paper

reinforcement learning and planning

0

0

0

0

15:07

06/12/2020

Deep Inverse Q-learning with Constraints

Gabriel Kalweit, Maria Huegle, Moritz Werling, Joschka Boedecker

Keywords Paper

0

0

0

0

3:14

02/02/2021

An Efficient Algorithm for Deep Stochastic Contextual Bandits

Tan Zhu, Guannan Liang, Chunjiang Zhu and
Haining Li, Jinbo Bi

Keywords Paper

0

0

0

0

14:36

26/04/2020

Implementation Matters in Deep RL: A Case Study on PPO and TRPO

Logan Engstrom, Andrew Ilyas, Shibani Santurkar and
Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

Keywords Paper

deep policy gradient methods, deep reinforcement learning, trpo, ppo

0

0

0

0

20:41

03/08/2020

Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

Seyed Mohammad Asghari, Yi Ouyang, Ashutosh Nayyar

Keywords Paper

0

0

0

0

7:49

06/12/2021

Continuous Mean-Covariance Bandits

Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

Keywords Paper

bandits

0

0

0

0

11:33