Gradient-free Online Learning in Continuous Games with Delayed Rewards

12/07/2020

Gradient-free Online Learning in Continuous Games with Delayed Rewards

Amélie Héliou, Panayotis Mertikopoulos, Zhengyuan Zhou

Keywords: Learning Theory

Abstract Paper Similar Papers

Abstract: Motivated by applications to online advertising and recommender systems, we consider a game-theoretic model with delayed rewards and asynchronous, payoff-based feedback. In contrast to previous work on delayed multi-armed bandits, we focus on games with continuous action spaces, and we examine the long-run behavior of strategic agents that follow a no-regret learning policy (but are otherwise oblivious to the game being played, the objectives of their opponents, etc.). To account for the lack of a consistent stream of information (for instance, rewards can arrive out of order and with an a priori unbounded delay), we introduce a gradient-free learning policy where payoff information is placed in a priority queue as it arrives. Somewhat surprisingly, we find that under a standard diagonal concavity assumption, the induced sequence of play converges to Nash Equilibrium (NE) with probability 1, even if the delay between choosing an action and receiving the corresponding reward is unbounded.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Making the most of your day: online learning for optimal allocation of time

Etienne Boursier, Tristan Garrec, Vianney Perchet, Marco Scarsini

Keywords Paper

bandits, online learning

0

0

0

0

15:16

06/12/2021

Identifiability in inverse reinforcement learning

Haoyang Cao, Samuel Cohen, Lukasz Szpruch

Keywords Paper

reinforcement learning and planning

0

0

0

0

15:07

06/12/2021

Rebounding Bandits for Modeling Satiation Effects

Liu Leqi, Fatma Kilinc Karzan, Zachary Lipton, Alan Montgomery

Keywords Paper

bandits

0

0

0

0

13:49

09/07/2020

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

Keywords Paper

Reinforcement learning, Planning and control

0

0

0

0

15:16

06/12/2021

Decentralized Q-learning in Zero-sum Markov Games

Muhammed Sayin, Kaiqing Zhang, David Leslie and
Tamer Basar, Asuman Ozdaglar

Keywords Paper

reinforcement learning and planning

0

0

0

0

15:07

06/12/2020

Online Bayesian Persuasion

Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Nicola Gatti

Keywords Paper

0

0

0

0

3:00

06/12/2021

Learning Equilibria in Matching Markets from Bandit Feedback

Meena Jagadeesan, Alexander Wei, Yixin Wang and
Michael Jordan, Jacob Steinhardt

Keywords Paper

bandits

0

0

0

0

15:04

06/12/2021

Learning One Representation to Optimize All Rewards

Ahmed Touati, Yann Ollivier

Keywords Paper

deep learning, reinforcement learning and planning, representation learning

0

0

0

0

14:52

13/04/2021

Contextual blocking bandits

Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

Keywords Paper

0

0

0

0

2:47

16/11/2020

DORB: Dynamically Optimizing Multiple Rewards with Bandits

Ramakanth Pasunuru, Han Guo, Mohit Bansal

Keywords Paper

language tasks, optimization rewards, nlg tasks, question generation

0

0

0

0

11:34

06/12/2021

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

14:22

06/12/2021

Online learning in MDPs with linear function approximation and bandit feedback.

Gergely Neu, Julia Olkhovskaya

Keywords Paper

reinforcement learning and planning, bandits, online learning

0

0

0

0

13:24

02/02/2021

An Efficient Algorithm for Deep Stochastic Contextual Bandits

Tan Zhu, Guannan Liang, Chunjiang Zhu and
Haining Li, Jinbo Bi

Keywords Paper

0

0

0

0

14:36

18/07/2021

Dynamic Planning and Learning under Recovering Rewards

David Simchi-Levi, Zeyu Zheng, Feng Zhu

Keywords Paper

Reinforcement Learning and Planning, Bandits

0

0

0

0

4:53

02/02/2021

Evolutionary Game Theory Squared: Evolving Agents in Endogenously Evolving Zero-Sum Games

Stratis Skoulakis, Tanner Fiez, Ryann Sim and
Georgios Piliouras, Lillian Ratliff

Keywords Paper

0

0

0

0

20:14

06/12/2021

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

Stefanos Leonardos, Georgios Piliouras, Kelly Spendlove

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:11

18/11/2020

Constrained reinforcement learning via policy splitting

Haoxian Chen, Henry Lam, Fengpei Li, Amirhossein Meisami

Keywords Paper

0

0

0

0

10:34

06/12/2021

Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games

Yu Bai, Chi Jin, Huan Wang, Caiming Xiong

Keywords Paper

theory, reinforcement learning and planning, bandits

0

0

0

0

12:14

06/12/2021

Counterbalancing Learning and Strategic Incentives in Allocation Markets

Jamie Kang, Faidra Monachou, Moran Koren, Itai Ashlagi

Keywords Paper

generative model

0

0

0

0

13:47

06/12/2021

Regime Switching Bandits

Xiang Zhou, Yi Xiong, Ningyuan Chen, Xuefeng GAO

Keywords Paper

reinforcement learning and planning, bandits, online learning

0

0

0

0

13:47

06/12/2020

Latent Bandits Revisited

Joey Hong, Branislav Kveton, Manzil Zaheer and
Yinlam Chow, Amr Ahmed, Craig Boutilier

Keywords Paper

0

0

0

0

3:11

06/12/2020

Preference-based Reinforcement Learning with Finite-Time Guarantees

Yichong Xu, Ruosong Wang, Lin Yang and
Aarti Singh, Artur Dubrawski

Keywords Paper

0

0

0

0

3:04

06/12/2021

Continuous Mean-Covariance Bandits

Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

Keywords Paper

bandits

0

0

0

0

11:33

26/08/2020

Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning

Ming Yin, Yu-Xiang Wang

Keywords Paper

0

0

0

0

14:17

06/12/2021

There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

Nathan Grinsztajn, Johan Ferret, Olivier Pietquin and
philippe preux, Matthieu Geist

Keywords Paper

reinforcement learning and planning

0

0

0

0

14:31

06/12/2020

A Bandit Learning Algorithm and Applications to Auction Design

Kim Thang Nguyen

Keywords Paper

0

0

0

0

2:43

03/08/2020

Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect

Priyank Agrawal, Theja Tulabandula

Keywords Paper

0

0

0

0

7:29

12/07/2020

Linear bandits with Stochastic Delayed Feedback

Claire Vernade, Alexandra Carpentier, Tor Lattimore and
Giovanni Zappella, Beyza Ermis, Michael Brueckner

Keywords Paper

Online Learning, Active Learning, and Bandits

1

1

0

0

13:25

06/12/2021

Reinforcement Learning in Reward-Mixing MDPs

Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

12:57

18/07/2021

Learning in Nonzero-Sum Stochastic Games with Potentials

David Mguni, Yutong Wu, Yali Du and
Yaodong Yang, Ziyi Wang, M. Li, Ying Wen, Joel Jennings, Jun Wang

Keywords Paper

Theory, Game Theory and Computational Economics

0

0

0

0

5:36

06/12/2021

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

meta learning, bandits

0

0

0

0

14:58

18/07/2021

Joint Online Learning and Decision-making via Dual Mirror Descent

Alfonso Lobos Ruiz, Paul Grigas, Zheng Wen

Keywords Paper

Deep Learning, Generative Models, Applications, Computer Vision; Applications, Visual Scene Analysis and Interpretation; Deep Learning, Adversarial Network, Algorithms, Online Learning Algorithms

0

0

0

0

5:15

06/12/2020

Cooperative Multi-player Bandit Optimization

Ilai Bistritz, Nicholas Bambos

Keywords Paper

0

0

0

0

3:13

19/08/2021

Learning in Markets: Greed Leads to Chaos but Following the Price is Right

Yun Kuen Cheung, Stefanos Leonardos, Georgios Piliouras

Keywords Paper

Agent-based and Multi-agent Systems, Economic Paradigms, Auctions and Market-Based Systems, Multi-agent Learning, Noncooperative Games

0

0

0

0

15:26

06/12/2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech, Runlong Zhou, Simon Du and
Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Paper

theory, reinforcement learning and planning

0

0

0

0

13:47

02/02/2021

Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate

Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli

Keywords Paper

0

0

0

0

18:04

06/12/2021

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris and
Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

Keywords Paper

bandits

0

0

0

0

12:07

06/12/2020

Robust Multi-Agent Reinforcement Learning with Model Uncertainty

Kaiqing Zhang, TAO SUN, Yunzhe Tao and
Sahika Genc, Sunil Mallya, Tamer Basar

Keywords Paper

0

0

0

0

3:11

26/08/2020

A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

P Sharoff, Nishant Mehta, Ravi Ganti

Keywords Paper

0

0

0

0

15:01

06/12/2020

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Sebastian Curi, Felix Berkenkamp, Andreas Krause

Keywords Paper

0

0

0

0

3:23