Convex Regularization in Monte-Carlo Tree Search

18/07/2021

Convex Regularization in Monte-Carlo Tree Search

Tuan Q Dam, Carlo D'Eramo, Jan Peters, Joni Pajarinen

Keywords: Reinforcement Learning and Planning

Abstract Paper Similar Papers

Abstract: Monte-Carlo planning and Reinforcement Learning (RL) are essential to sequential decision making. The recent AlphaGo and AlphaZero algorithms have shown how to successfully combine these two paradigms to solve large-scale sequential decision problems. These methodologies exploit a variant of the well-known UCT algorithm to trade off the exploitation of good actions and the exploration of unvisited states, but their empirical success comes at the cost of poor sample-efficiency and high computation time. In this paper, we overcome these limitations by introducing the use of convex regularization in Monte-Carlo Tree Search (MCTS) to drive exploration efficiently and to improve policy updates. First, we introduce a unifying theory on the use of generic convex regularizers in MCTS, deriving the first regret analysis of regularized MCTS and showing that it guarantees an exponential convergence rate. Second, we exploit our theoretical framework to introduce novel regularized backup operators for MCTS, based on the relative entropy of the policy update and, more importantly, on the Tsallis entropy of the policy, for which we prove superior theoretical guarantees. We empirically verify the consequence of our theoretical results on a toy problem. Finally, we show how our framework can easily be incorporated in AlphaGo and we empirically show the superiority of convex regularization, w.r.t. representative baselines, on well-known RL problems across several Atari games.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICML 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

19/08/2021

Hindsight Trust Region Policy Optimization

Hanbo Zhang, Site Bai, Xuguang Lan and
David Hsu, Nanning Zheng

Keywords Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning

0

0

0

0

13:14

18/07/2021

Bilevel Optimization: Convergence Analysis and Enhanced Design

Kaiyi Ji, Junjie Yang, Yingbin LIANG

Keywords Paper

Optimization, Non-Convex Optimization

0

0

0

0

5:02

18/07/2021

Provably Efficient Algorithms for Multi-Objective Competitive RL

Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra

Keywords Paper

Theory, RL, Decisions and Control Theory

0

0

0

0

17:04

18/07/2021

Distributed Second Order Methods with Fast Rates and Compressed Communication

Rustem Islamov, Xun Qian, Peter Richtarik

Keywords Paper

Optimization

0

0

0

0

4:51

06/12/2020

Modeling and Optimization Trade-off in Meta-learning

Katelyn Gao, Ozan Sener

Keywords Paper

0

0

0

0

3:21

13/04/2021

Improving KernelSHAP: Practical shapley value estimation using linear regression

Ian Covert, Su-In Lee

Keywords Paper

0

0

0

0

2:52

03/05/2021

Linear Last-iterate Convergence in Constrained Saddle-point Optimization

Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo

Keywords Paper

Game Theory, Last-iterate Convergence, Optimistic Multiplicative Weights Update, Optimistic Gradient Descent Ascent, Optimistic Mirror Decent, Saddle-point Optimization

0

0

0

0

4:59

06/12/2020

Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond

Charles Margossian, Aki Vehtari, Daniel Simpson, Raj Agrawal

Keywords Paper

0

0

0

0

3:05

06/12/2021

Stability and Generalization of Bilevel Programming in Hyperparameter Optimization

Fan Bao, Guoqiang Wu, Chongxuan LI and
Jun Zhu, Bo Zhang

Keywords Paper

optimization

0

0

0

0

8:58

26/08/2020

A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games

Waïss Azizian, Ioannis Mitliagkas, Simon Lacoste-Julien, Gauthier Gidel

Keywords Paper

0

0

0

0

14:40

12/07/2020

A Generic First-Order Algorithmic Framework for Bi-Level Programming Beyond Lower-Level Singleton

Risheng Liu, Pan Mu, Xiaoming Yuan and
Shangzhi Zeng, Jin Zhang

Keywords Paper

Optimization - Non-convex

0

0

0

0

13:51

02/02/2021

The Value-Improvement Path: Towards Better Representations for Reinforcement Learning

Will Dabney, André Barreto, Mark Rowland and
Robert Dadashi, John Quan, Marc G. Bellemare, David Silver

Keywords Paper

0

0

0

0

20:06

18/07/2021

FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning

Tianhao Zhang, 岳珩李, Chen Wang and
Guangming Xie, Zongqing Lu

Keywords Paper

Reinforcement Learning and Planning, Multi-Agent RL

0

0

0

0

3:53

06/12/2021

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

Dylan J Foster, Akshay Krishnamurthy

Keywords Paper

theory, reinforcement learning and planning, bandits, online learning

0

0

0

0

19:34

19/08/2021

Stability and Generalization for Randomized Coordinate Descent

Puyu Wang, Liang Wu, Yunwen Lei

Keywords Paper

Machine Learning, Learning Theory, Online Learning

0

0

0

0

13:18

06/12/2020

Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Fan Zhou, Jianing Wang, Xingdong Feng

Keywords Paper

0

0

0

0

3:11

18/07/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Fei Feng, Wotao Yin, Alekh Agarwal, Lin Yang

Keywords Paper

Deep Learning, Adversarial Networks, Applications, Fairness, Accountability, and Transparency, Theory, RL, Decisions and Control Theory

0

0

0

0

5:03

06/12/2021

Practical, Provably-Correct Interactive Learning in the Realizable Setting: The Power of True Believers

Julian Katz-Samuels, Blake Mason, Kevin Jamieson, Rob Nowak

Keywords Paper

theory, machine learning, bandits, kernel methods, active learning

0

0

0

0

7:41

12/07/2020

A simpler approach to accelerated optimization: iterative averaging meets optimism

Pooria Joulani, Anant Raj, András György, Csaba Szepesvari

Keywords Paper

Online Learning, Active Learning, and Bandits

0

0

1

1

16:17

06/12/2021

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

Gal Dalal, Assaf Hallak, Steven Dalton and
iuri frosio, Shie Mannor, Gal Chechik

Keywords Paper

theory, reinforcement learning and planning

1

0

0

1

15:00

06/12/2020

Chaos, Extremism and Optimism: Volume Analysis of Learning in Games

Yun Kuen Cheung, Georgios Piliouras

Keywords Paper

0

0

0

0

3:22

09/07/2020

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

Keywords Paper

Reinforcement learning, Planning and control

0

0

0

0

15:16

06/12/2021

Unifying Width-Reduced Methods for Quasi-Self-Concordant Optimization

Deeksha Adil, Brian Bullins, Sushant Sachdeva

Keywords Paper

optimization

0

0

0

0

12:14

26/04/2020

Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models

Yixuan Qiu, Lingsong Zhang, Xiao Wang

Keywords Paper

energy model, restricted Boltzmann machine, contrastive divergence, unbiased Markov chain Monte Carlo, distribution coupling

0

0

0

0

4:34

06/12/2020

Robust, Accurate Stochastic Optimization for Variational Inference

Akash Kumar Dhaka, Alejandro Catalina, Michael Andersen and
Måns Magnusson, Jonathan Huggins, Aki Vehtari

Keywords Paper

0

0

0

0

3:23

06/12/2021

Stochastic Anderson Mixing for Nonconvex Stochastic Optimization

Fuchao Wei, Chenglong Bao, Yang Liu

Keywords Paper

theory, deep learning, optimization, machine learning, vision

0

0

0

0

9:55

06/12/2020

Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method

Qi Zhou, Yufei Kuang, Zherui Qiu and
Houqiang Li, Jie Wang

Keywords Paper

0

0

0

0

3:10

26/04/2020

On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach

Yuanhao Wang, Guodong Zhang, Jimmy Ba

Keywords Paper

minimax optimization, smooth differentiable games, local convergence, generative adversarial networks, optimization

0

0

0

0

4:54

06/12/2020

Large-Scale Methods for Distributionally Robust Optimization

Daniel Levy, Yair Carmon, John Duchi, Aaron Sidford

Keywords Paper

0

0

0

0

3:11

02/02/2021

Infinite Gaussian Mixture Modeling with an Improved Estimation of the Number of Clusters

Avi Matza, Yuval Bistritz

Keywords Paper

0

0

0

0

20:14

18/07/2021

Principal Component Hierarchy for Sparse Quadratic Programs

Robbie Vreugdenhil, Viet Anh Nguyen, Armin Eftekhari, Peyman Mohajerin Esfahani

Keywords Paper

Deep Learning, Optimization, Convex Optimization, Applications, Natural Language Processing

0

0

0

0

5:14

12/07/2020

Responsive Safety in Reinforcement Learning

Adam Stooke, Joshua Achiam, Pieter Abbeel

Keywords Paper

Reinforcement Learning - Deep RL

0

0

0

0

13:36

06/12/2021

Tactical Optimism and Pessimism for Deep Reinforcement Learning

Ted Moskovitz, Jack Parker-Holder, Aldo Pacchiano and
Michael Arbel, Michael Jordan

Keywords Paper

reinforcement learning and planning, bandits

0

0

0

0

6:30

13/04/2021

The base measure problem and its solution

Alexey Radul, Boris Alexeev

Keywords Paper

0

0

0

0

3:30

02/02/2021

Exact Reduction of Huge Action Spaces in General Reinforcement Learning

Sultan J. Majeed, Marcus Hutter

Keywords Paper

0

0

0

0

20:33

18/07/2021

Robust Unsupervised Learning via L-statistic Minimization

Andreas Maurer, Daniela Angela Parletta, Andrea Paudice, Massimiliano Pontil

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:03

12/07/2020

Stochastic Hamiltonian Gradient Methods for Smooth Games

Nicolas Loizou, Hugo Berard, Alexia Jolicoeur-Martineau and
Pascal Vincent, Simon Lacoste-Julien, Ioannis Mitliagkas

Keywords Paper

Optimization - Non-convex

0

0

0

0

16:16

06/12/2020

Demystifying Orthogonal Monte Carlo and Beyond

Han Lin, Haoxian Chen, Krzysztof M Choromanski and
Tianyi Zhang, Clement Laroche

Keywords Paper

0

0

0

0

3:19

06/12/2020

Fast Epigraphical Projection-based Incremental Algorithms for Wasserstein Distributionally Robust Support Vector Machine

Jiajin Li, Caihua Chen, Anthony Man-Cho So

Keywords Paper

Algorithms -> Meta-Learning; Applications -> Object Recognition; Data, Challenges, Implementations, and Software -> Benchmarks;, Algorithms -> Multitask and Transfer Learning

0

0

0

0

3:02

18/07/2021

Sample-Optimal PAC Learning of Halfspaces with Malicious Noise

Jie Shen

Keywords Paper

Theory, Computational Learning Theory

0

0

0

0

4:37