POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

Abstract: Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces. In this paper, we consider Monte-Carlo planning in an environment with continuous state-action spaces, a much less understood problem with important applications in control and robotics. We introduce POLY-HOOT, an algorithm that augments MCTS with a continuous armed bandit strategy named Hierarchical Optimistic Optimization (HOO) (Bubeck et al., 2011). Specifically, we enhance HOO by using an appropriate polynomial, rather than logarithmic, bonus term in the upper confidence bounds. Such a polynomial bonus is motivated by its empirical successes in AlphaGo Zero (Silver et al., 2017b), as well as its significant role in achieving theoretical guarantees of finite space MCTS (Shah et al., 2019). We investigate, for the first time, the regret of the enhanced HOO algorithm in non-stationary bandit problems. Using this result as a building block, we establish non-asymptotic convergence guarantees for POLY-HOOT: the value estimate converges to an arbitrarily small neighborhood of the optimal value function at a polynomial rate. We further provide experimental results that corroborate our theoretical findings.

13/04/2021

POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

Weichao Mao, Kaiqing Zhang, Qiaomin Xie, Tamer Basar

Comments

Similar Papers

Experimental design for regret minimization in linear bandits

Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson

Keywords Abstract Paper

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

Andrea Tirinzoni, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric

Keywords Abstract Paper

Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations

Zhuoran Yang, Chi Jin, Zhaoran Wang and Mengdi Wang, Michael Jordan

Keywords Abstract Paper

Optimal Gradient-based Algorithms for Non-concave Bandit Optimization

Baihe Huang, Kaixuan Huang, Sham Kakade and Jason Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

Keywords Abstract Paper

theory, deep learning, optimization, generative model, bandits

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

Kenji Kawaguchi, Haihao Lu

Keywords Abstract Paper

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

Dylan J Foster, Akshay Krishnamurthy

Keywords Abstract Paper

theory, reinforcement learning and planning, bandits, online learning

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Bohan Wang, Qi Meng, Wei Chen, Tie-Yan Liu

Keywords Abstract Paper

Theory, Deep learning Theory

Projection-free Online Learning in Dynamic Environments

Yuanyu Wan, Bo Xue, Lijun Zhang

Keywords Abstract Paper

On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization

Xu Cai, Jonathan Scarlett

Keywords Abstract Paper

Applications, Natural Language Processing, Applications, Network Analysis, Reinforcement Learning and Planning, Bandits

An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits

Julian Katz-Samuels, Lalit Jain, zohar karnin, Kevin Jamieson

Keywords Abstract Paper

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Abstract Paper

meta learning, bandits

Last iterate convergence of SGD for Least-Squares in the Interpolation regime.

Aditya Vardhan Varre, Loucas Pillaud-Vivien, Nicolas Flammarion

Keywords Abstract Paper

deep learning, optimization

Large-Scale Methods for Distributionally Robust Optimization

Daniel Levy, Yair Carmon, John Duchi, Aaron Sidford

Keywords Abstract Paper

Improved Regret Bounds for Projection-free Bandit Convex Optimization

Dan Garber, Ben Kretzu

Keywords Abstract Paper

Low-rank generalized linear bandit problems

Yangyi Lu, Amirhossein Meisami, Ambuj Tewari

Keywords Abstract Paper

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

Keywords Abstract Paper

optimization, reinforcement learning and planning, bandits

Distributionally Robust Optimization with Markovian Data

Mengmeng Li, Tobias Sutter, Daniel Kuhn

Keywords Abstract Paper

Optimization

Beyond Variance Reduction: Understanding the True Impact of Baselines on Policy Optimization

Wes Chung, Valentin Thomas, Marlos C. Machado, Nicolas Le Roux

Keywords Abstract Paper

Reinforcement Learning and Planning

Bellman-consistent Pessimism for Offline Reinforcement Learning

Tengyang Xie, Ching-An Cheng, Nan Jiang and Paul Mineiro, Alekh Agarwal

Keywords Abstract Paper

theory, reinforcement learning and planning, robustness

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang

Keywords Abstract Paper

Minimizing Dynamic Regret and Adaptive Regret Simultaneously

Lijun Zhang, Shiyin Lu, Tianbao Yang

Keywords Abstract Paper

SLIP: Learning to predict in unknown dynamical systems with long-term memory

Paria Rashidinejad, Jiantao Jiao, Stuart Russell

Keywords Abstract Paper

Keywords Paper

Keywords Paper

Zhuoran Yang, Chi Jin, Zhaoran Wang and
Mengdi Wang, Michael Jordan

Keywords Paper

Baihe Huang, Kaixuan Huang, Sham Kakade and
Jason Lee, Qi Lei, Runzhe Wang, Jiaqi Yang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tengyang Xie, Ching-An Cheng, Nan Jiang and
Paul Mineiro, Alekh Agarwal

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Chen-Yu Wei, Mehdi Jafarnia, Haipeng Luo and
Hiteshi Sharma, Rahul Jain

Keywords Paper

Mingrui Zhang, Zebang Shen, Aryan Mokhtari and
Hamed Hassani, Amin Karbasi

Keywords Paper

Keywords Paper

Keywords Paper

Botao Hao, Nevena Lazic, Yasin Abbasi-Yadkori and
Pooria Joulani, Csaba Szepesvari

Keywords Paper

Keywords Paper

Ilias Diakonikolas, Surbhi Goel, Sushrut Karmalkar and
Adam Klivans, Mahdi Soltanolkotabi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Fan Bao, Guoqiang Wu, Chongxuan LI and
Jun Zhu, Bo Zhang

Keywords Paper

Keywords Paper