Online learning in MDPs with linear function approximation and bandit feedback.

06/12/2021

Online learning in MDPs with linear function approximation and bandit feedback.

Gergely Neu, Julia Olkhovskaya

Keywords: reinforcement learning and planning, bandits, online learning

Abstract Paper Similar Papers

Abstract: We consider the problem of online learning in an episodic Markov decision process, where the reward function is allowed to change between episodes in an adversarial manner and the learner only observes the rewards associated with its actions. We assume that rewards and the transition function can be represented as linear functions in terms of a known low-dimensional feature map, which allows us to consider the setting where the state space is arbitrarily large. We also assume that the learner has a perfect knowledge of the MDP dynamics. Our main contribution is developing an algorithm whose expected regret after $T$ episodes is bounded by $\widetilde{\mathcal{O}}(\sqrt{dHT})$, where $H$ is the number of steps in each episode and $d$ is the dimensionality of the feature map.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at NeurIPS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

14:22

04/08/2021

Online Markov Decision Processes with Aggregate Bandit Feedback

Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

Keywords Paper

0

0

0

0

13:07

06/12/2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech, Runlong Zhou, Simon Du and
Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

13:47

12/07/2020

Near-optimal Regret Bounds for Stochastic Shortest Path

Aviv Rosenberg, Alon Cohen, Yishay Mansour, Haim Kaplan

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

15:57

12/07/2020

Online Learning with Imperfect Hints

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Keywords Paper

Online Learning, Active Learning, and Bandits

1

1

1

1

13:17

18/07/2021

Online Learning in Unknown Markov Games

Yi Tian, Yuanhao Wang, Tiancheng Yu, Suvrit Sra

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

5:13

06/12/2021

Learning One Representation to Optimize All Rewards

Ahmed Touati, Yann Ollivier

Keywords Paper

deep learning, reinforcement learning and planning, representation learning

0

0

0

0

14:52

09/07/2020

Root-n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

Kefan Dong, Jian Peng, Yining Wang, Yuan Zhou

Keywords Paper

Reinforcement learning,

0

0

0

0

12:46

06/12/2020

A Bandit Learning Algorithm and Applications to Auction Design

Kim Thang Nguyen

Keywords Paper

0

0

0

0

2:43

06/12/2020

Online Non-Convex Optimization with Imperfect Feedback

Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, Thibaud Rahier

Keywords Paper

0

0

0

0

3:23

06/12/2021

Revisiting Smoothed Online Learning

Lijun Zhang, Wei Jiang, Shiyin Lu, Tianbao Yang

Keywords Paper

optimization, online learning

0

0

0

0

12:36

06/12/2021

Learning-to-learn non-convex piecewise-Lipschitz functions

Maria-Florina Balcan, Mikhail Khodak, Dravyansh Sharma, Ameet S Talwalkar

Keywords Paper

optimization, machine learning, robustness, meta learning, online learning

0

0

0

0

14:13

06/12/2020

Online learning with dynamics: A minimax perspective

Kush Bhatia, Karthik Sridharan

Keywords Paper

0

0

0

0

3:09

06/12/2020

Online Linear Optimization with Many Hints

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Keywords Paper

0

0

0

0

3:18

04/08/2021

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Paper

0

0

0

0

20:29

04/08/2021

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

Liyu Chen, Haipeng Luo, Chen-Yu Wei

Keywords Paper

0

0

0

0

14:48

02/02/2021

Reinforcement Learning with Trajectory Feedback

Yonathan Efroni, Nadav Merlis, Shie Mannor

Keywords Paper

0

0

0

0

14:17

18/07/2021

Optimal regret algorithm for Pseudo-1d Bandit Convex Optimization

Aadirupa Saha, Nagarajan Natarajan, Praneeth Netrapalli, Prateek Jain

Keywords Paper

Optimization, Convex Optimization

0

0

0

0

6:19

13/04/2021

Online k-means clustering

Vincent Cohen-Addad, Benjamin Guedj, Varun Kanade, Guy Rom

Keywords Paper

0

0

0

0

2:52

12/07/2020

Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition

Chi Jin, Tiancheng Jin, Haipeng Luo and
Suvrit Sra, Tiancheng Yu

Keywords Paper

Reinforcement Learning - Theory

0

0

0

0

15:14

18/07/2021

Joint Online Learning and Decision-making via Dual Mirror Descent

Alfonso Lobos Ruiz, Paul Grigas, Zheng Wen

Keywords Paper

Deep Learning, Generative Models, Applications, Computer Vision; Applications, Visual Scene Analysis and Interpretation; Deep Learning, Adversarial Network, Algorithms, Online Learning Algorithms

0

0

0

0

5:15

06/12/2021

Bandit Phase Retrieval

Tor Lattimore, Botao Hao

Keywords Paper

bandits

0

0

0

0

14:14

03/08/2020

What You See May Not Be What You Get: UCB Bandit Algorithms Robust to $\varepsilon$-Contamination

Laura Niss, Ambuj Tewari

Keywords Paper

0

0

0

0

8:02

12/07/2020

Projection-free Distributed Online Convex Optimization with $O(\sqrt{T})$ Communication Complexity

Yuanyu Wan, Wei-Wei Tu, Lijun Zhang

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

0

0

11:48

18/07/2021

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Dongruo Zhou, Jiafan He, Quanquan Gu

Keywords Paper

Reinforcement Learning and Planning

0

0

0

0

5:20

09/07/2020

Online Learning with Vector Costs and Bandits with Knapsacks

Thomas Kesselheim, Sahil Singla

Keywords Paper

Online learning, Approximation algorithms, Bandit problems

0

0

0

0

15:18

06/12/2021

Scheduling jobs with stochastic holding costs

Dabeen Lee, Milan Vojnovic

Keywords Paper

0

0

0

0

15:13

06/12/2021

Neural Active Learning with Performance Guarantees

Zhilei Wang, Pranjal Awasthi, Christoph Dann and
Ayush Sekhari, Claudio Gentile

Keywords Paper

deep learning, active learning

0

0

0

0

10:43

06/12/2021

Littlestone Classes are Privately Online Learnable

Noah Golowich, Roi Livni

Keywords Paper

machine learning, online learning, privacy

0

0

0

0

16:13

06/12/2021

Online Selective Classification with Limited Feedback

Aditya Gangrade, Anil Kag, Ashok Cutkosky, Venkatesh Saligrama

Keywords Paper

machine learning, online learning

0

0

0

0

15:14

06/12/2021

Best-case lower bounds in online learning

Cristóbal Guzmán, Nishant Mehta, Ali Mortazavi

Keywords Paper

theory, optimization, online learning, fairness

0

0

0

0

14:58

02/02/2021

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

Min-hwan Oh, Garud Iyengar

Keywords Paper

0

0

0

0

15:47

06/12/2020

Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent

Dimitris Fotakis, Thanasis Lianeas, Georgios Piliouras, Stratis Skoulakis

Keywords Paper

0

0

0

0

2:53

06/12/2021

Optimal Algorithms for Stochastic Contextual Preference Bandits

Aadirupa Saha

Keywords Paper

bandits

0

0

0

0

16:00

02/02/2021

Adversarial Linear Contextual Bandits with Graph-Structured Side Observations

Lingda Wang, Bingcong Li, Huozhi Zhou and
Georgios B. Giannakis, Lav R. Varshney, Zhizhen Zhao

Keywords Paper

0

0

0

0

14:14

18/07/2021

Zeroth-Order Non-Convex Learning via Hierarchical Dual Averaging

Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, Thibaud J Rahier

Keywords Paper

Optimization, Non-Convex Optimization

0

0

0

0

5:25

13/04/2021

Online sparse reinforcement learning

Botao Hao, Tor Lattimore, Csaba Szepesvari, Mengdi Wang

Keywords Paper

0

0

0

0

2:58

18/07/2021

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

Omar Darwiche Domingues, Pierre Menard, Matteo Pirotta and
Emilie Kaufmann, Michal Valko

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

4:58

06/12/2020

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

Yuval Emek, Ron Lavi, Rad Niazadeh, Yangguang Shi

Keywords Paper

0

0

0

0

3:10

26/04/2020

Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication

Yuanhao Wang, Jiachen Hu, Xiaoyu Chen, Liwei Wang

Keywords Paper

Theory, Bandit Algorithms, Communication Efficiency

0

0

0

0

5:01