Optimizing percentile criterion using robust MDPs

Abstract: We address the problem of computing reliable policies in reinforcement learning problems with limited data. In particular, we compute policies that achieve good returns with high confidence when deployed. This objective, known as the percentile criterion, can be optimized using Robust MDPs (RMDPs). RMDPs generalize MDPs to allow for uncertain transition probabilities chosen adversarially from given ambiguity sets. We show that the RMDP solution’s sub-optimality depends on the spans of the ambiguity sets along the value function. We then propose new algorithms that minimize the span of ambiguity sets defined by weighted L1 and L-infinity norms. Our primary focus is on Bayesian guarantees, but we also describe how our methods apply to frequentist guarantees and derive new concentration inequalities for weighted L1 and L-infinity norms. Experimental results indicate that our optimized ambiguity sets improve significantly on prior construction methods.

18/07/2021

Optimizing percentile criterion using robust MDPs

Bahram Behzadian, Reazul Hasan Russel, Marek Petrik, Chin Pang Ho

Comments

Similar Papers

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin LIANG

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Adversarially robust estimate and risk analysis in linear regression

Yue Xing, Ruizhi Zhang, Guang Cheng

Keywords Abstract Paper

Twice regularized MDPs and the equivalence between robustness and regularization

Esther Derman, Matthieu Geist, Shie Mannor

Keywords Abstract Paper

optimization, reinforcement learning and planning, robustness

Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning

Sebastian Curi, Ilija Bogunovic, Andreas Krause

Keywords Abstract Paper

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

Dylan Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

Keywords Abstract Paper

Conformal Bayesian Computation

Edwin Fong, Chris C Holmes

Keywords Abstract Paper

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Abstract Paper

Reinforcement Learning and Planning, Multi-Agent RL

What are the Statistical Limits of Offline RL with Linear Function Approximation?

Ruosong Wang, Dean Foster, Sham M Kakade

Keywords Abstract Paper

batch reinforcement learning, representation, function approximation, lower bound

Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting

Frederic Koehler, Lijia Zhou, [deadname] J Sutherland, Nathan Srebro

Keywords Abstract Paper

Benign Overfitting of Constant-Stepsize SGD for Linear Regression

Difan Zou, Jingfeng Wu, Vladimir Braverman and Quanquan Gu, Sham Kakade

Keywords Abstract Paper

Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions

Huan Ma, Zongbo Han, Changqing Zhang and Huazhu Fu, Joey Tianyi Zhou, Qinghua Hu

Keywords Abstract Paper

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

Keywords Abstract Paper

deep learning, reinforcement learning and planning

Post-Contextual-Bandit Inference

Aurelien Bibaut, Maria Dimakopoulou, Nathan Kallus and Antoine Chambaz, Mark van der Laan

Keywords Abstract Paper

bandits, causality

Minimax Weight and Q-Function Learning for Off-Policy Evaluation

Masatoshi Uehara, Jiawei Huang, Nan Jiang

Keywords Abstract Paper

Stability and Generalization of Bilevel Programming in Hyperparameter Optimization

Fan Bao, Guoqiang Wu, Chongxuan LI and Jun Zhu, Bo Zhang

Keywords Abstract Paper

Estimating Principal Components under Adversarial Perturbations

Pranjal Awasthi, Xue Chen, Aravindan Vijayaraghavan

Keywords Abstract Paper

Unsupervised and semi-supervised learning, Adversarial learning and robustness

Average-Reward Reinforcement Learning with Trust Region Methods

Xiaoteng Ma, Xiaohang Tang, Li Xia and Jun Yang, Qianchuan Zhao

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning, Markov Decision Processes

Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach

James Queeney, Ioannis Ch. Paschalidis, Christos G. Cassandras

Keywords Abstract Paper

CoinDICE: Off-Policy Confidence Interval Estimation

Bo Dai, Ofir Nachum, Yinlam Chow and Lihong Li, Csaba Szepesvari, Dale Schuurmans

Keywords Abstract Paper

Learning prediction intervals for regression: Generalization and calibration

Haoxian Chen, Ziyi Huang, Henry Lam and Huajie Qian, Haofeng Zhang

Keywords Abstract Paper

Momentum-Based Policy Gradient Methods

Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

Keywords Abstract Paper

A Unifying View of Optimism in Episodic Reinforcement Learning

Gergely Neu, Ciara Pike-Burke

Keywords Abstract Paper

Towards Robust Bisimulation Metric Learning

Mete Kemertas, Tristan Aumentado-Armstrong

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and
Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Paper

Keywords Paper

Keywords Paper

Difan Zou, Jingfeng Wu, Vladimir Braverman and
Quanquan Gu, Sham Kakade

Keywords Paper

Huan Ma, Zongbo Han, Changqing Zhang and
Huazhu Fu, Joey Tianyi Zhou, Qinghua Hu

Keywords Paper

Keywords Paper

Aurelien Bibaut, Maria Dimakopoulou, Nathan Kallus and
Antoine Chambaz, Mark van der Laan

Keywords Paper

Keywords Paper

Fan Bao, Guoqiang Wu, Chongxuan LI and
Jun Zhu, Bo Zhang

Keywords Paper

Keywords Paper

Xiaoteng Ma, Xiaohang Tang, Li Xia and
Jun Yang, Qianchuan Zhao

Keywords Paper

Keywords Paper

Bo Dai, Ofir Nachum, Yinlam Chow and
Lihong Li, Csaba Szepesvari, Dale Schuurmans

Keywords Paper

Haoxian Chen, Ziyi Huang, Henry Lam and
Huajie Qian, Haofeng Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Son Nguyen, Duong Nguyen, Khai Nguyen and
Khoat Than, Hung Bui, Nhat Ho

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

Keywords Paper