Online Learning with Imperfect Hints

Abstract: We consider a variant of the classical online linear optimization problem in which at every step, the online player receives a ``hint'' vector before choosing the action for that round. Rather surprisingly, it was shown that if the hint vector is guaranteed to have a positive correlation with the cost vector, then the online player can achieve a regret of $O(\log T)$, thus significantly improving over the $O(\sqrt{T})$ regret in the general setting. However, the result and analysis require the correlation property at \emph{all} time steps, thus raising the natural question: can we design online learning algorithms that are resilient to bad hints? In this paper we develop algorithms and nearly matching lower bounds for online learning with imperfect hints. Our algorithms are oblivious to the quality of the hints, and the regret bounds interpolate between the always-correlated hints case and the no-hints case. Our results also generalize, simplify, and improve upon previous results on optimistic regret bounds, which can be viewed as an additive version of hints.

06/12/2020

Online Learning with Imperfect Hints

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Comments

Similar Papers

Temporal Variability in Implicit Online Learning

Nicolò Campolongo, Francesco Orabona

Keywords Abstract Paper

Efficient Bandit Convex Optimization: Beyond Linear Losses

Arun Sai Suggala, Pradeep Ravikumar, Praneeth Netrapalli

Keywords Abstract Paper

Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games

Arun Suggala, Praneeth Netrapalli

Keywords Abstract Paper

Online Linear Optimization with Many Hints

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Keywords Abstract Paper

Experimental design for regret minimization in linear bandits

Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson

Keywords Abstract Paper

Online Learning in Unknown Markov Games

Yi Tian, Yuanhao Wang, Tiancheng Yu, Suvrit Sra

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Delay and Cooperation in Nonstochastic Linear Bandits

Shinji Ito, Daisuke Hatano, Hanna Sumita and Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi

Keywords Abstract Paper

Logarithmic Regret from Sublinear Hints

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Keywords Abstract Paper

optimization, online learning

Online Markov Decision Processes with Aggregate Bandit Feedback

Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour

Keywords Abstract Paper

Online k-means clustering

Vincent Cohen-Addad, Benjamin Guedj, Varun Kanade, Guy Rom

Keywords Abstract Paper

Online Learning with Primary and Secondary Losses

Avrim Blum, Han Shao

Keywords Abstract Paper

Dynamic Regret of Convex and Smooth Functions

Peng Zhao, Yu-Jie Zhang, Lijun Zhang, Zhi-Hua Zhou

Keywords Abstract Paper

Learning Online Algorithms with Distributional Advice

Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos and Ali Vakilian, Nikos Zarifis

Keywords Abstract Paper

Algorithms

A Simple Online Algorithm for Competing with Dynamic Comparators

Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou

Keywords Abstract Paper

Online Learning with Continuous Variations: Dynamic Regret and Reductions

Ching-An Cheng, Jonathan Lee, Ken Goldberg, Byron Boots

Keywords Abstract Paper

Minimizing Dynamic Regret and Adaptive Regret Simultaneously

Lijun Zhang, Shiyin Lu, Tianbao Yang

Keywords Abstract Paper

Power of hints for online learning with movement costs

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Keywords Abstract Paper

Online learning in MDPs with linear function approximation and bandit feedback.

Gergely Neu, Julia Olkhovskaya

Keywords Abstract Paper

reinforcement learning and planning, bandits, online learning

Robust Online Convex Optimization in the Presence of Outliers

Tim van Erven, Sarah Sachs, Wouter M Koolen, Wojciech Kotlowski

Keywords Abstract Paper

Online Knapsack with Frequency Predictions

Sungjin Im, Ravi Kumar, Mahshid Montazer Qaem, Manish Purohit

Keywords Abstract Paper

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang

Keywords Abstract Paper

Reinforcement learning, Planning and control

Projection-free Online Learning in Dynamic Environments

Yuanyu Wan, Bo Xue, Lijun Zhang

Keywords Abstract Paper

Best-case lower bounds in online learning

Cristóbal Guzmán, Nishant Mehta, Ali Mortazavi

Keywords Abstract Paper

theory, optimization, online learning, fairness

Neural Active Learning with Performance Guarantees

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Shinji Ito, Daisuke Hatano, Hanna Sumita and
Kei Takemura, Takuro Fukunaga, Naonori Kakimura, Ken-Ichi Kawarabayashi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos and
Ali Vakilian, Nikos Zarifis

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Zhilei Wang, Pranjal Awasthi, Christoph Dann and
Ayush Sekhari, Claudio Gentile

Keywords Paper

Keywords Paper

Keywords Paper

Shuang Qiu, Xiaohan Wei, Jieping Ye and
Zhaoran Wang, Zhuoran Yang

Keywords Paper

Lingda Wang, Bingcong Li, Huozhi Zhou and
Georgios B. Giannakis, Lav R. Varshney, Zhizhen Zhao

Keywords Paper

Jean Tarbouriech, Evrard Garcelon, Michal Valko and
Matteo Pirotta, Alessandro Lazaric

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper