Dynamic Regret of Policy Optimization in Non-Stationary Environments

Abstract: We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information reward feedback and unknown fixed transition kernels. We propose two model-free policy optimization algorithms, POWER and POWER++, and establish guarantees for their dynamic regret. Compared with the classical notion of static regret, dynamic regret is a stronger notion as it explicitly accounts for the non-stationarity of environments. The dynamic regret attained by the proposed algorithms interpolates between different regimes of non-stationarity, and moreover satisfies a notion of adaptive (near-)optimality, in the sense that it matches the (near-)optimal static regret under slow-changing environments. The dynamic regret bound features two components, one arising from exploration, which deals with the uncertainty of transition kernels, and the other arising from adaptation, which deals with non-stationary environments. Specifically, we show that POWER++ improves over POWER on the second component of the dynamic regret by actively adapting to non-stationarity through prediction. To the best of our knowledge, our work is the first dynamic regret analysis of model-free RL algorithms in non-stationary environments.

18/07/2021

Dynamic Regret of Policy Optimization in Non-Stationary Environments

Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

Comments

Similar Papers

Model-based Reinforcement Learning for Continuous Control with Posterior Sampling

Ying Fan, Yifei Ming

Keywords Abstract Paper

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Abstract Paper

Tightening Exploration in Upper Confidence Reinforcement Learning

Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

Keywords Abstract Paper

Fast Rates for the Regret of Offline Reinforcement Learning

Yichun Hu, Nathan Kallus, Masatoshi Uehara

Keywords Abstract Paper

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

Yingjie Fei, Zhuoran Yang, Yudong Chen and Zhaoran Wang, Qiaomin Xie

Keywords Abstract Paper

On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization

Xu Cai, Jonathan Scarlett

Keywords Abstract Paper

Applications, Natural Language Processing, Applications, Network Analysis, Reinforcement Learning and Planning, Bandits

Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints

Tianhao Wang, Dongruo Zhou, Quanquan Gu

Keywords Abstract Paper

Adaptive Discretization for Adversarial Lipschitz Bandits

Chara Podimata, Alex Slivkins

Keywords Abstract Paper

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Dongruo Zhou, Jiafan He, Quanquan Gu

Keywords Abstract Paper

Corruption-robust exploration in episodic reinforcement learning

Thodoris Lykouris, Max Simchowitz, Alex Slivkins, Wen Sun

Keywords Abstract Paper

Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach

Tom Fei, Zhuoran Yang, Zhaoran Wang

Keywords Abstract Paper

Instance-wise minimax-optimal algorithms for logistic bandits

Marc Abeille, Louis Faury, Clement Calauzenes

Keywords Abstract Paper

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and Christian Tjandraatmadja, Craig Boutilier

Keywords Abstract Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

Regret-optimal filtering

Oron Sabag, Babak Hassibi

Keywords Abstract Paper

Improved exploration in factored average-reward MDPs

Mohammad Sadegh Talebi, Anders Jonsson, Odalric Maillard

Keywords Abstract Paper

Optimistic Policy Optimization with Bandit Feedback

Lior Shani, Yonathan Efroni, Aviv Rosenberg, Shie Mannor

Keywords Abstract Paper

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Reinforcement Learning with Trajectory Feedback

Yonathan Efroni, Nadav Merlis, Shie Mannor

Keywords Abstract Paper

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

Keywords Abstract Paper

A Reduction from Reinforcement Learning to No-Regret Online Learning

Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon

Keywords Abstract Paper

Online model selection for reinforcement learning with function approximation

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar and Weihao Kong, Emma Brunskill

Keywords Abstract Paper

Reinforcement learning in parametric MDPs with exponential families

Sayak Ray Chowdhury, Aditya Gopalan, Odalric-Ambrym Maillard

Keywords Abstract Paper

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Keywords Abstract Paper

Smooth bandit optimization: Generalization to holder space

Yusha Liu, Yining Wang, Aarti Singh

Keywords Abstract Paper

Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yingjie Fei, Zhuoran Yang, Yudong Chen and
Zhaoran Wang, Qiaomin Xie

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar and
Weihao Kong, Emma Brunskill

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jiachen Hu, Xiaoyu Chen, Chi Jin and
Lihong Li, Liwei Wang

Keywords Paper

Keywords Paper

Keywords Paper

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

Keywords Paper