Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Abstract: In this paper we consider multi-objective reinforcement learning where the objectives are balanced using preferences. In practice, the preferences are often given in an adversarial manner, e.g., customers can be picky in many applications. We formalize this problem as an episodic learning problem on a Markov decision process, where transitions are unknown and a reward function is the inner product of a preference vector with pre-specified multi-objective reward functions. We consider two settings. In the online setting, the agent receives a (adversarial) preference every episode and proposes policies to interact with the environment. We provide a model-based algorithm that achieves a nearly minimax optimal regret bound $\widetilde{\mathcal{O}}\bigl(\sqrt{\min\{d,S\}\cdot H^2 SAK}\bigr)$, where $d$ is the number of objectives, $S$ is the number of states, $A$ is the number of actions, $H$ is the length of the horizon, and $K$ is the number of episodes. Furthermore, we consider preference-free exploration, i.e., the agent first interacts with the environment without specifying any preference and then is able to accommodate arbitrary preference vector up to $\epsilon$ error. Our proposed algorithm is provably efficient with a nearly optimal trajectory complexity $\widetilde{\mathcal{O}}\bigl({\min\{d,S\}\cdot H^3 SA}/{\epsilon^2}\bigr)$. This result partly resolves an open problem raised by \citet{jin2020reward}.

06/12/2021

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Jingfeng Wu, Vladimir Braverman, Lin Yang

Comments

Similar Papers

Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris and Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

Keywords Abstract Paper

Near-optimal Regret Bounds for Stochastic Shortest Path

Aviv Rosenberg, Alon Cohen, Yishay Mansour, Haim Kaplan

Keywords Abstract Paper

Online Learning, Active Learning, and Bandits

Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints

Tianhao Wang, Dongruo Zhou, Quanquan Gu

Keywords Abstract Paper

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

Keywords Abstract Paper

Reward-Free Exploration for Reinforcement Learning

Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu

Keywords Abstract Paper

Online learning in MDPs with linear function approximation and bandit feedback.

Gergely Neu, Julia Olkhovskaya

Keywords Abstract Paper

reinforcement learning and planning, bandits, online learning

Reinforcement Learning with Trajectory Feedback

Yonathan Efroni, Nadav Merlis, Shie Mannor

Keywords Abstract Paper

Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning

Guanlin Liu, Lifeng LAI

Keywords Abstract Paper

reinforcement learning and planning, adversarial robustness and security

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Jean Tarbouriech, Runlong Zhou, Simon Du and Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Abstract Paper

theory, reinforcement learning and planning

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Dongruo Zhou, Jiafan He, Quanquan Gu

Keywords Abstract Paper

Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

Seyed Mohammad Asghari, Yi Ouyang, Ashutosh Nayyar

Keywords Abstract Paper

Optimal Algorithms for Stochastic Contextual Preference Bandits

Keywords Abstract Paper

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

Bingyan Wang, Yuling Yan, Jianqing Fan

Keywords Abstract Paper

theory, reinforcement learning and planning, generative model

Adaptive Reward-Poisoning Attacks against Reinforcement Learning

Xuezhou Zhang, Yuzhe Ma, Adish Singla, Jerry Zhu

Keywords Abstract Paper

Model-based Reinforcement Learning for Continuous Control with Posterior Sampling

Ying Fan, Yifei Ming

Keywords Abstract Paper

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

jiafan he, Dongruo Zhou, Quanquan Gu

Keywords Abstract Paper

RL for Latent MDPs: Regret Guarantees and a Lower Bound

Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Keywords Abstract Paper

Learning One Representation to Optimize All Rewards

Ahmed Touati, Yann Ollivier

Keywords Abstract Paper

deep learning, reinforcement learning and planning, representation learning

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Kaiqing Zhang, Sham Kakade, Tamer Basar, Lin Yang

Keywords Abstract Paper

Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

Weitong ZHANG, Dongruo Zhou, Quanquan Gu

Keywords Abstract Paper

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

Yingjie Fei, Zhuoran Yang, Yudong Chen and Zhaoran Wang, Qiaomin Xie

Keywords Abstract Paper

Near Optimal Reward-Free Reinforcement Learning

Zhang Zihan, Simon Du, Xiangyang Ji

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Dynamic Regret of Policy Optimization in Non-Stationary Environments

Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie

Keywords Abstract Paper

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Lin Yang, Yu-Zhen Janice Chen, Stephen Pasteris and
Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jean Tarbouriech, Runlong Zhou, Simon Du and
Matteo Pirotta, Michal Valko, Alessandro Lazaric

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yingjie Fei, Zhuoran Yang, Yudong Chen and
Zhaoran Wang, Qiaomin Xie

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy and
Daniel Hsu, Thodoris Lykouris, Miro Dudik, Robert E Schapire

Keywords Paper

Keywords Paper

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang and
Zhaoran Wang, Mihailo Jovanovic

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tongzheng Ren, Jialian Li, Bo Dai and
Simon Du, Sujay Sanghavi

Keywords Paper

Keywords Paper

Diederik M. Roijers , Luisa M Zintgraf, Pieter Libin and
Mathieu Reymond , Eugenio Bargiacchi, Ann Nowé

Keywords Paper

David Lindner, Matteo Turchetta, Sebastian Tschiatschek and
Kamil Ciosek, Andreas Krause

Keywords Paper

Keywords Paper

Keywords Paper