Abstract:
Multiple-environment Markov decision processes (MEMDPs) are MDPs equipped
with not one, but multiple probabilistic transition functions, which
represent the various possible unknown environments.
While the previous research on MEMDPs focused on theoretical properties for
long-run average payoff, we study them with discounted-sum payoff and
focus on their practical advantages and applications. MEMDPs can be viewed
as a special case of Partially observable and Mixed observability MDPs: the
state of the system is perfectly observable, but not the environment. We
show that the specific structure of MEMDPs allows for more efficient
algorithmic analysis, in particular for faster belief updates. We experimentally
demonstrate the applicability of MEMDPs in several domains, including contextual recommendation systems and parameterized Markov decision processes.