Best-item Learning in Random Utility Models with Subset Choices

Abstract: We consider the problem of PAC learning the most valuable item from a pool of $n$ items using sequential, adaptively chosen plays of subsets of $k$ items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities. We identify a new property of such a RUM, termed the minimum advantage, that helps in characterizing the complexity of separating pairs of items based on their relative win/loss empirical counts, and can be bounded as a function of the noise distribution alone. We give a learning algorithm for general RUMs, based on pairwise relative counts of items and hierarchical elimination, along with a new PAC sample complexity guarantee of $O(\frac{n}{c^2\epsilon^2} \log \frac{k}{\delta})$ rounds to identify an $\epsilon$-optimal item with confidence $1-\delta$, when the worst case pairwise advantage in the RUM has sensitivity at least $c$ to the parameter gaps of items. Fundamental lower bounds on PAC sample complexity show that this is near-optimal in terms of its dependence on $n,k$ and $c$.

12/07/2020

Best-item Learning in Random Utility Models with Subset Choices

Aadirupa Saha , Bangalore), Aditya Gopalan , Bangalore)

Comments

Similar Papers

From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model

Aadirupa Saha, Aditya Gopalan

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

The Sample Complexity of Best-$k$ Items Selection from Pairwise Comparisons

Wenbo Ren, Jia Liu, Ness Shroff

Keywords Abstract Paper

Optimal Algorithms for Stochastic Contextual Preference Bandits

Keywords Abstract Paper

Multi-group Agnostic PAC Learnability

Guy Rothblum, Gal Yona

Keywords Abstract Paper

Social Aspects of Machine Learning, Fairness, Accountability, and Transparency

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

Yihan Du, Yuko Kuroki, Wei Chen

Keywords Abstract Paper

Choice Bandits

Arpit Agarwal, Nicholas Johnson, Shivani Agarwal

Keywords Abstract Paper

Exponential Weights Algorithms for Selective Learning

Mingda Qiao, Gregory Valiant

Keywords Abstract Paper

Bandit Phase Retrieval

Tor Lattimore, Botao Hao

Keywords Abstract Paper

Contextual Combinatorial Volatile Multi-armed Bandit with Adaptive Discretization

Andi Nika, Sepehr Elahi, Cem Tekin

Keywords Abstract Paper

Robust learning under clean-label attack

Avrim Blum, Steve Hanneke, Jian Qian, Han Shao

Keywords Abstract Paper

Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action Tasks

Haobo Jiang, Jin Xie, Jian Yang

Keywords Abstract Paper

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Keywords Abstract Paper

theory, reinforcement learning and planning

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

Min-hwan Oh, Garud Iyengar

Keywords Abstract Paper

Nearly Horizon-Free Offline Reinforcement Learning

Tongzheng Ren, Jialian Li, Bo Dai and Simon Du, Sujay Sanghavi

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning

Adaptive Sampling for Best Policy Identification in Markov Decision Processes

Aymen Al Marjani, Alexandre Proutiere

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

Bingyan Wang, Yuling Yan, Jianqing Fan

Keywords Abstract Paper

theory, reinforcement learning and planning, generative model

Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang

Keywords Abstract Paper

Learning One Representation to Optimize All Rewards

Ahmed Touati, Yann Ollivier

Keywords Abstract Paper

deep learning, reinforcement learning and planning, representation learning

Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning

Ming Yin, Yu-Xiang Wang

Keywords Abstract Paper

Bayesian decision-making under misspecified priors with applications to meta-learning

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Abstract Paper

meta learning, bandits

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Kaiqing Zhang, Sham Kakade, Tamer Basar, Lin Yang

Keywords Abstract Paper

Extrapolation Towards Imaginary 0-Nearest Neighbour and Its Improved Convergence Rate

Akifumi Okuno, Hidetoshi Shimodaira

Keywords Abstract Paper

Robust causal inference under covariate shift via worst-case subpopulation treatment effects

Sookyo Jeong, Hongseok Namkoong

Keywords Abstract Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tongzheng Ren, Jialian Li, Bo Dai and
Simon Du, Sujay Sanghavi

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Gen Li, Laixi Shi, Yuxin Chen and
Yuantao Gu, Yuejie Chi

Keywords Paper

Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris and
Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

Keywords Paper

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Shuli Jiang, Dongyu Li, Irene Mengze Li and
Arvind Mahankali, David Woodruff

Keywords Paper

Keywords Paper