A Reduction from Reinforcement Learning to No-Regret Online Learning

26/08/2020

A Reduction from Reinforcement Learning to No-Regret Online Learning

Ching-An Cheng, Remi Tachet des Combes, Byron Boots, Geoff Gordon

Keywords:

Abstract Paper Similar Papers

Abstract: We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which 'any' online algorithm with sublinear regret can generate policies with provable performance guarantees. This new perspective decouples the RL problem into two parts: regret minimization and function approximation. The first part admits a standard online-learning analysis, and the second part can be quantified independently of the learning algorithm. Therefore, the proposed reduction can be used as a tool to systematically design new RL algorithms. We demonstrate this idea by devising a simple RL algorithm based on mirror descent and the generative-model oracle. For any $\gamma$-discounted tabular RL problem, with probability at least $1-\delta$, it learns an $\epsilon$-optimal policy using at most $\tilde{O}\left(\frac{|\SS||\AA|\log(\frac{1}{\delta})}{(1-\gamma)^4\epsilon^2}\right)$ samples. Furthermore, this algorithm admits a direct extension to linearly parameterized function approximators for large-scale applications, with computation and sample complexities independent of $|\SS|$,$|\AA|$, though at the cost of potential approximation bias.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at AISTATS 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

12/07/2020

Model-Based Reinforcement Learning with Value-Targeted Regression

Zeyu Jia, Lin Yang, Csaba Szepesvari and
Mengdi Wang, Alex Ayoub

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

10:44

04/08/2021

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture MDPs

Dongruo Zhou, Quanquan Gu, Csaba Szepesvari

Keywords Paper

0

0

0

0

16:33

18/07/2021

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Dongruo Zhou, Jiafan He, Quanquan Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:20

06/12/2021

Near-Optimal Offline Reinforcement Learning via Double Variance Reduction

Ming Yin, Yu Bai, Yu-Xiang Wang

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:57

06/12/2020

Geometric Exploration for Online Control

Orestis Plevrakis, Elad Hazan

Keywords Paper

0

0

0

0

3:21

26/04/2020

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Pan Xu, Felicia Gao, Quanquan Gu

Keywords Paper

Policy Gradient, Reinforcement Learning, Sample Efficiency

0

0

0

0

4:40

18/07/2021

Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity

Zhang Zihan, Yuan Zhou, Xiangyang Ji

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:03

06/12/2020

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

0

0

0

0

3:42

18/07/2021

Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach

Tom Fei, Zhuoran Yang, Zhaoran Wang

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

17:05

06/12/2020

Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity

Simon Du, Jason Lee, Gaurav Mahajan, Ruosong Wang

Keywords Paper

0

0

0

0

1:56

06/12/2020

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang

Keywords Paper

0

0

0

0

3:22

06/12/2021

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Tengyang Xie, Nan Jiang, Huan Wang and
Caiming Xiong, Yu Bai

Keywords Paper

theory, optimization, reinforcement learning and planning

1

0

0

0

10:57

06/12/2020

Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?

Qiwen Cui, Lin Yang

Keywords Paper

Algorithms -> Semi-Supervised Learning; Deep Learning -> Deep Autoencoders; Deep Learning -> Generative Models, Probabilistic Methods -> Variational Inference

0

0

0

0

3:25

06/12/2021

Nearly Horizon-Free Offline Reinforcement Learning

Tongzheng Ren, Jialian Li, Bo Dai and
Simon Du, Sujay Sanghavi

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

8:44

06/12/2021

Best-case lower bounds in online learning

Cristóbal Guzmán, Nishant Mehta, Ali Mortazavi

Keywords Paper

theory, optimization, online learning, fairness

0

0

0

0

14:58

18/07/2021

Model-based Reinforcement Learning for Continuous Control with Posterior Sampling

Ying Fan, Yifei Ming

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

18:34

13/04/2021

Logistic q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

Keywords Paper

0

0

0

0

2:44

06/12/2020

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Kaiqing Zhang, Sham Kakade, Tamer Basar, Lin Yang

Keywords Paper

0

0

0

0

3:25

06/12/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Paper

optimization, reinforcement learning and planning, bandits

0

0

0

0

15:17

06/12/2020

Dynamic Regret of Policy Optimization in Non-Stationary Environments

Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

Keywords Paper

0

0

0

0

2:41

06/12/2021

Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning

Xin Zhang, Zhuqing Liu, Jia Liu and
Zhengyuan Zhu, Songtao Lu

Keywords Paper

theory, optimization, reinforcement learning and planning

0

0

0

0

14:54

26/04/2020

Reanalysis of Variance Reduced Temporal Difference Learning

Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang

Keywords Paper

Reinforcement Learning, TD learning, Markovian sample, Variance Reduction

0

0

0

0

4:29

06/12/2021

Contextual Recommendations and Low-Regret Cutting-Plane Algorithms

Sreenivas Gollapudi, Guru Guruganesh, Kostas Kollias and
Pasin Manurangsi, Renato Leme, Jon Schneider

Keywords Paper

bandits, online learning

0

0

0

0

7:29

06/12/2020

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Guy Lorberbom, Chris J. Maddison, Nicolas Heess and
Tamir Hazan, Daniel Tarlow

Keywords Paper

0

0

0

0

3:16

02/02/2021

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Paper

0

0

0

0

17:13

04/08/2021

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

Liyu Chen, Haipeng Luo, Chen-Yu Wei

Keywords Paper

0

0

0

0

14:48

26/04/2020

CAQL: Continuous Action Q-Learning

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Reinforcement learning (RL), DQN, Continuous control, Mixed-Integer Programming (MIP)

0

0

0

0

5:36

06/12/2020

Adaptive Online Estimation of Piecewise Polynomial Trends

Dheeraj Baby, Yu-Xiang Wang

Keywords Paper

1

1

0

1

3:12

18/07/2021

Private Stochastic Convex Optimization: Optimal Rates in L1 Geometry

Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

Keywords Paper

Deep Learning, Algorithms, Multitask and Transfer Learning; Algorithms, Online Learning, Social Aspects of Machine Learning, Privacy, Anonymity, and Security

0

0

0

0

17:27

04/08/2021

Non-stationary Reinforcement Learning without Prior Knowledge: an Optimal Black-box Approach

Chen-Yu Wei, Haipeng Luo

Keywords Paper

0

0

0

0

17:25

09/07/2020

Provably Efficient Reinforcement Learning with Linear Function Approximation

Chi Jin, Zhuoran Yang, Zhaoran Wang, Michael Jordan

Keywords Paper

Reinforcement learning,

0

0

0

0

13:04

18/07/2021

On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization

Xu Cai, Jonathan Scarlett

Keywords Paper

Applications, Natural Language Processing, Applications, Network Analysis, Reinforcement Learning and Planning, Bandits

0

0

0

0

4:19

12/07/2020

Parameter-free, Dynamic, and Strongly-Adaptive Online Learning

Ashok Cutkosky

Keywords Paper

Online Learning, Active Learning, and Bandits

1

1

0

0

14:58

06/12/2020

Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions

Matthew Faw, Rajat Sen, Karthikeyan Shanmugam and
Constantine Caramanis, Sanjay Shakkottai

Keywords Paper

0

0

0

0

3:24

06/12/2021

A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum

Prashant Khanduri, Siliang Zeng, Mingyi Hong and
Hoi-To Wai, Zhaoran Wang, Zhuoran Yang

Keywords Paper

optimization

0

0

0

0

9:47

06/12/2021

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

15:32

12/07/2020

Efficiently Solving MDPs with Stochastic Mirror Descent

Yujia Jin, Aaron Sidford

Keywords Paper

Optimization - Convex

0

0

0

0

14:56

06/12/2020

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Raphaël Berthier, Francis Bach, Pierre Gaillard

Keywords Paper

Optimization -> Non-Convex Optimization, Deep Learning -> Optimization for Deep Networks

0

0

0

0

3:05

26/04/2020

Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

Yuanhao Wang, Kefan Dong, Xiaoyu Chen, Liwei Wang

Keywords Paper

theory, reinforcement learning, sample complexity

0

0

0

0

3:25

09/07/2020

Root-n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

Kefan Dong, Jian Peng, Yining Wang, Yuan Zhou

Keywords Paper

Reinforcement learning,

0

0

0

0

12:46