Abstract:
Policy gradient methods are a class of powerful algorithms in reinforcement learning (RL). More recently, some variance reduced policy gradient methods have been developed to improve sample efficiency
and obtain a near-optimal sample complexity $O(\epsilon^{-3})$ for finding an
$\epsilon$-stationary point of non-concave performance function in model-free RL.
However, the practical performances of these variance reduced policy gradient methods are not consistent with their near-optimal sample complexity,
because these methods require large batches and
strict learning rates to achieve this optimal complexity.
In the paper, thus, we propose a class of efficient momentum-based policy gradient methods, which use adaptive learning rates and do not require large batches.
Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method by using the important sampling technique.
Meanwhile, we also propose a fast hessian-aided momentum-based policy gradient (HA-MBPG) method via using the semi-hessian information.
In theoretical analysis, we prove that our algorithms also have the sample complexity $O(\epsilon^{-3})$, as the existing best policy gradient methods.
In the experiments, we use some benchmark tasks to demonstrate the effectiveness of algorithms.