Acting in Delayed Environments with Non-Stationary Markov Policies

03/05/2021

Acting in Delayed Environments with Non-Stationary Markov Policies

Esther Derman, Gal Dalal, Shie Mannor

Keywords: reinforcement learning, delay

Abstract Paper Similar Papers

Abstract: The standard Markov Decision Process (MDP) formulation hinges on the assumption that an action is executed immediately after it was chosen. However, assuming it is often unrealistic and can lead to catastrophic failures in applications such as robotic manipulation, cloud computing, and finance. We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps. The brute-force state augmentation baseline where the state is concatenated to the last $m$ committed actions suffers from an exponential complexity in $m$, as we show for policy iteration. We then prove that with execution delay, deterministic Markov policies in the original state-space are sufficient for attaining maximal reward, but need to be non-stationary. As for stationary Markov policies, we show they are sub-optimal in general. Consequently, we devise a non-stationary Q-learning style model-based algorithm that solves delayed execution tasks without resorting to state-augmentation. Experiments on tabular, physical, and Atari domains reveal that it converges quickly to high performance even for substantial delays, while standard approaches that either ignore the delay or rely on state-augmentation struggle or fail due to divergence. The code is available at \url{https://github.com/galdl/rl_delay_basic.git}.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

18/07/2021

Adapting to Delays and Data in Adversarial Multi-Armed Bandits

András György, Pooria Joulani

Keywords Paper

Deep Learning, Attention Models, Applications, Time Series Analysis; Deep Learning, Predictive Models, Reinforcement Learning and Planning, Bandits

0

0

0

0

6:18

06/12/2021

Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

Seohong Park, Jaekyeom Kim, Gunhee Kim

Keywords Paper

reinforcement learning and planning

0

0

0

0

8:53

18/07/2021

On Limited-Memory Subsampling Strategies for Bandits

Dorian Baudry, Yoan Russac, Olivier Cappé

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:33

06/12/2021

Automated Dynamic Mechanism Design

Hanrui Zhang, Vincent Conitzer

Keywords Paper

0

0

0

0

14:35

06/12/2020

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Paper

0

0

0

0

3:18

26/04/2020

At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?

Niv Giladi, Mor Shpigel Nacson, Elad Hoffer, Daniel Soudry

Keywords Paper

implicit bias, stability, neural networks, generalization gap, asynchronous SGD

0

0

0

0

5:03

12/07/2020

Linear bandits with Stochastic Delayed Feedback

Claire Vernade, Alexandra Carpentier, Tor Lattimore and
Giovanni Zappella, Beyza Ermis, Michael Brueckner

Keywords Paper

Online Learning, Active Learning, and Bandits

1

1

0

0

13:25

13/04/2021

Online model selection for reinforcement learning with function approximation

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar and
Weihao Kong, Emma Brunskill

Keywords Paper

0

0

0

0

3:15

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

06/12/2020

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Keywords Paper

0

0

0

0

3:17

12/07/2020

Non-Stationary Bandits with Intermediate Observations

Claire Vernade, András György, Timothy Mann

Keywords Paper

Online Learning, Active Learning, and Bandits

1

1

0

0

14:40

06/12/2021

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

Tiancheng Jin, Longbo Huang, Haipeng Luo

Keywords Paper

reinforcement learning and planning, online learning

0

0

0

0

19:08

26/08/2020

Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation

Shuhang Chen, Adithya Devraj, Ana Busic, Sean Meyn

Keywords Paper

0

0

0

0

10:37

06/12/2021

Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning

Aurelien Bibaut, Nathan Kallus, Maria Dimakopoulou and
Antoine Chambaz, Mark van der Laan

Keywords Paper

theory, reinforcement learning and planning, machine learning, bandits

0

0

0

0

16:07

06/12/2021

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

15:32

18/11/2020

Run2Survive: A decision-theoretic approach to algorithm selection based on survival analysis

Alexander Tornede, Marcel Wever, Stefan Werner and
Felix Mohr, Eyke Hüllermeier

Keywords Paper

0

0

0

0

11:33

06/12/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Paper

optimization, reinforcement learning and planning, bandits

0

0

0

0

15:17

06/12/2021

Model-Based Domain Generalization

Alexander Robey, George J. Pappas, Hamed Hassani

Keywords Paper

theory, deep learning, optimization, robustness, domain adaptation

0

0

0

0

15:08

06/12/2021

Adversarial Robustness with Semi-Infinite Constrained Learning

Alexander Robey, Luiz Chamon, George J. Pappas and
Hamed Hassani, Alejandro Ribeiro

Keywords Paper

theory, deep learning, optimization, robustness, adversarial robustness and security

0

0

0

0

14:48

12/07/2020

Optimizing for the Future in Non-Stationary MDPs

Yash Chandak, Georgios Theocharous, Shiv Shankar and
Martha White, Sridhar Mahadevan, Philip Thomas

Keywords Paper

Reinforcement Learning - General

0

0

0

0

15:37

06/12/2020

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

Gen Li, Yuting Wei, Yuejie Chi and
Yuantao Gu, Yuxin Chen

Keywords Paper

0

0

0

0

3:09

18/07/2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits

Tianyuan Jin, Jing Tang, Pan Xu and
Keke Huang, Xiaokui Xiao, Quanquan Gu

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

5:19

06/12/2021

Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints

Maura Pintor, Fabio Roli, Wieland Brendel, Battista Biggio

Keywords Paper

optimization, machine learning, robustness, adversarial robustness and security, vision

0

0

0

0

11:35

18/07/2021

Nondeterminism and Instability in Neural Network Optimization

Cecilia Summers, Michael J Dinneen

Keywords Paper

Deep Learning, Optimization for Deep Networks

0

0

0

0

5:12

06/12/2020

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

Tiancheng Jin, Haipeng Luo

Keywords Paper

0

0

0

0

3:39

06/12/2021

Twice regularized MDPs and the equivalence between robustness and regularization

Esther Derman, Matthieu Geist, Shie Mannor

Keywords Paper

optimization, reinforcement learning and planning, robustness

0

0

0

0

14:19

06/12/2020

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

Andrea Tirinzoni, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric

Keywords Paper

0

0

0

0

3:13

03/05/2021

FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Lanqing Li, Rui Yang, Dijun Luo

Keywords Paper

distance metric learning, offline/batch reinforcement learning, meta-reinforcement learning, contrastive learning, multi-task reinforcement learning

1

0

0

0

6:21

12/07/2020

Tightening Exploration in Upper Confidence Reinforcement Learning

Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

Keywords Paper

Reinforcement Learning - General

0

0

0

0

16:14

13/04/2021

Finite-sample regret bound for distributionally robust offline tabular reinforcement learning

Zhengqing Zhou, Zhengyuan Zhou, Qinxun Bai and
Linhai Qiu, Jose Blanchet, Peter Glynn

Keywords Paper

0

0

0

0

3:02

09/07/2020

Estimating Principal Components under Adversarial Perturbations

Pranjal Awasthi, Xue Chen, Aravindan Vijayaraghavan

Keywords Paper

Unsupervised and semi-supervised learning, Adversarial learning and robustness

0

0

0

0

15:40

15/11/2020

Testing Consensus Implementations using Communication Closure

Cezara Drăgoi, Constantin Enea, Burcu Kulahcioglu Ozkan and
Rupak Majumdar, Filip Niksic

Keywords Paper

Distributed consensus, Communication closure, Randomized testing

0

0

0

0

15:19

06/12/2021

Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

Max Ryabinin, Andrey Malinin, Mark Gales

Keywords Paper

machine learning

0

0

0

0

12:36

03/05/2021

Sequential Density Ratio Estimation for Simultaneous Optimization of Speed and Accuracy

Akinori Ebihara, Taiki Miyagawa, Kazuyuki Sakurai, Hitoshi Imaoka

Keywords Paper

Density ratio estimation, Early classification, Sequential probability ratio test

0

0

0

0

9:55

18/07/2021

Training Recurrent Neural Networks via Forward Propagation Through Time

Anil Kag, Venkatesh Saligrama

Keywords Paper

Algorithms, Supervised Learning

0

0

0

0

5:20

02/02/2021

GaussianPath:A Bayesian Multi-Hop Reasoning Framework for Knowledge Graph Reasoning

Guojia Wan, Bo Du

Keywords Paper

0

0

0

0

13:52

18/07/2021

Learning from History for Byzantine Robust Optimization

Praneeth Karimireddy, Lie He, Martin Jaggi

Keywords Paper

Optimization, Non-Convex Optimization

0

0

0

1

5:01

03/05/2021

Fidelity-based Deep Adiabatic Scheduling

Eli Ovits, Lior Wolf

Keywords Paper

0

0

0

0

9:45

06/12/2020

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

0

0

0

0

3:42

22/06/2020

Efficiently learning structured distributions from untrusted batches

Sitan Chen, Jerry Li, Ankur Moitra

Keywords Paper

sum-of-squares, federated learning, VC complexity, Robust statistics

0

0

0

0

24:38