Abstract:
This work investigates monte-carlo planning for agents in stochastic (and potentially large) environments, that may have multiple objectives for which the priorities are not known a priori, or may not be easy to quantify. In this work we propose Convex Hull Monte-Carlo Tree-Search, which builds upon Trial Based Heuristic Tree Search and Convex Hull Value Iteration, as a solution to planning with multiple objectives in large environments. Moreover, we consider how to pose the problem of multi-objective planning as a contextual multi-armed bandits problem, giving a principled motivation for how to select actions from the view of contextual regret.