13/04/2021

Finite-sample regret bound for distributionally robust offline tabular reinforcement learning

Zhengqing Zhou, Zhengyuan Zhou, Qinxun Bai, Linhai Qiu, Jose Blanchet, Peter Glynn

Keywords:

Abstract: While reinforcement learning has witnessed tremendous success recently in a wide range of domains, robustness–or the lack thereof–remains an important issue that remains inadequately addressed. In this paper, we provide a distributionally robust formulation of offline learning policy in tabular RL that aims to learn a policy from historical data (collected by some other behavior policy) that is robust to the future environment arising as a perturbation of the training environment. We first develop a novel policy evaluation scheme that accurately estimates the robust value (i.e. how robust it is in a perturbed environment) of any given policy and establish its finite-sample estimation error. Building on this, we then develop a novel and minimax-optimal distributionally robust learning algorithm that achieves O_P\left(1/\sqrt{n}\right) regret, meaning that with high probability, the policy learned from using n training data points will be O\left(1/\sqrt{n}\right) close to the optimal distributionally robust policy. Finally, our simulation results demonstrate the superiority of our distributionally robust approach compared to non-robust RL algorithms.

 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at AISTATS 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers