Optimal regret algorithm for Pseudo-1d Bandit Convex Optimization

Abstract: We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions $\f_t$ admit a "pseudo-1d" structure, i.e. $\f_t(\w) = \loss_t(\pred_t(\w))$ where the output of $\pred_t$ is one-dimensional. At each round, the learner observes context $\x_t$, plays prediction $\pred_t(\w_t; \x_t)$ (e.g. $\pred_t(\cdot)=\langle \x_t, \cdot\rangle$) for some $\w_t \in \mathbb{R}^d$ and observes loss $\loss_t(\pred_t(\w_t))$ where $\loss_t$ is a convex Lipschitz-continuous function. The goal is to minimize the standard regret metric. This pseudo-1d bandit convex optimization problem (\SBCO) arises frequently in domains such as online decision-making or parameter-tuning in large systems. For this problem, we first show a regret lower bound of $\min(\sqrt{dT}, T^{3/4})$ for any algorithm, where $T$ is the number of rounds. We propose a new algorithm \sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively, guaranteeing the {\em optimal} regret bound mentioned above, up to additional logarithmic factors. In contrast, applying state-of-the-art online convex optimization methods leads to $\tilde{O}\left(\min\left(d^{9.5}\sqrt{T},\sqrt{d}T^{3/4}\right)\right)$ regret, that is significantly suboptimal in terms of $d$.

06/12/2021

Deep Learning, Algorithms, Multitask and Transfer Learning; Algorithms, Online Learning, Social Aspects of Machine Learning, Privacy, Anonymity, and Security

17:27

04/08/2021

Optimal regret algorithm for Pseudo-1d Bandit Convex Optimization

Aadirupa Saha, Nagarajan Natarajan, Praneeth Netrapalli, Prateek Jain

Comments

Similar Papers

Logarithmic Regret from Sublinear Hints

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Keywords Abstract Paper

optimization, online learning

Bandit Phase Retrieval

Tor Lattimore, Botao Hao

Keywords Abstract Paper

bandits

Optimal Dynamic Regret in Exp-Concave Online Learning

Dheeraj Baby, Yu-Xiang Wang

Keywords Abstract Paper

Improved Regret Bounds for Projection-free Bandit Convex Optimization

Dan Garber, Ben Kretzu

Keywords Abstract Paper

Taking a hint: How to leverage loss predictors in contextual bandits?

Chen-Yu Wei, Haipeng Luo, Alekh Agarwal

Keywords Abstract Paper

Bandit problems, Online learning

Naive Exploration is Optimal for Online LQR

Max Simchowitz, Dylan Foster

Keywords Abstract Paper

Reinforcement Learning - Theory

Making Non-Stochastic Control (Almost) as Easy as Stochastic

Max Simchowitz

Keywords Abstract Paper

Adaptive Online Estimation of Piecewise Polynomial Trends

Dheeraj Baby, Yu-Xiang Wang

Keywords Abstract Paper

Revisiting Smoothed Online Learning

Lijun Zhang, Wei Jiang, Shiyin Lu, Tianbao Yang

Keywords Abstract Paper

optimization, online learning

Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

Arun Suggala, Praneeth Netrapalli

Keywords Abstract Paper

Private Stochastic Convex Optimization: Optimal Rates in L1 Geometry

Hilal Asi, Vitaly Feldman, Tomer Koren, Kunal Talwar

Keywords Abstract Paper

Deep Learning, Algorithms, Multitask and Transfer Learning; Algorithms, Online Learning, Social Aspects of Machine Learning, Privacy, Anonymity, and Security

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Abstract Paper

Optimal Order Simple Regret for Gaussian Process Bandits

Sattar Vakili, Nacime Bouziani, Sepehr Jalali and Alberto Bernacchia, Da-shan Shiu

Keywords Abstract Paper

optimization, reinforcement learning and planning, bandits, kernel methods

Exponential Weights Algorithms for Selective Learning

Mingda Qiao, Gregory Valiant

Keywords Abstract Paper

Online Learning with Vector Costs and Bandits with Knapsacks

Thomas Kesselheim, Sahil Singla

Keywords Abstract Paper

Online learning, Approximation algorithms, Bandit problems

Combinatorial gaussian process bandits with probabilistically triggered arms

Ilker Demirel, Cem Tekin

Keywords Abstract Paper

Delay and Cooperation in Nonstochastic Linear Bandits

Shinji Ito, Daisuke Hatano, Hanna Sumita and Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi

Keywords Abstract Paper

Littlestone Classes are Privately Online Learnable

Noah Golowich, Roi Livni

Keywords Abstract Paper

machine learning, online learning, privacy

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang

Keywords Abstract Paper

UCB-based Algorithms for Multinomial Logistic Regression Bandits

Sanae Amani, Christos Thrampoulidis

Keywords Abstract Paper

bandits

Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent

Dimitris Fotakis, Thanasis Lianeas, Georgios Piliouras, Stratis Skoulakis

Keywords Abstract Paper

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

Liyu Chen, Haipeng Luo, Chen-Yu Wei

Keywords Abstract Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sattar Vakili, Nacime Bouziani, Sepehr Jalali and
Alberto Bernacchia, Da-shan Shiu

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Shinji Ito, Daisuke Hatano, Hanna Sumita and
Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Sreenivas Gollapudi, Guru Guruganesh, Kostas Kollias and
Pasin Manurangsi, Renato Leme, Jon Schneider

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper