Distributional Reinforcement Learning via Moment Matching

Abstract: We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method that learns a finite set of statistics from each return distribution via neural networks, as in the distributional RL literature. Existing distributional RL methods however constrain the learned statistics to predefined functional forms of the return distribution which is both restrictive in representation and difficult in maintaining the predefined statistics. Instead, we learn unrestricted statistics, i.e., deterministic (pseudo-)samples, of the return distribution by leveraging a technique from hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simpler objective amenable to backpropagation. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. We establish sufficient conditions for the contraction of the distributional Bellman operator and provide finite-sample analysis for the deterministic samples in distribution approximation. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents.

18/07/2021

Distributional Reinforcement Learning via Moment Matching

Thanh Nguyen-Tang, Sunil Gupta, Svetha Venkatesh

Comments

Similar Papers

Ensemble Bootstrapping for Q-Learning

Oren Peer, Chen Tessler, Nadav Merlis, Ron Meir

Keywords Abstract Paper

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

Yaqi Duan, Chi Jin, Zhiyuan Li

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

ReLU Regression with Massart Noise

Ilias Diakonikolas, Jong Ho Park, Christos Tzamos

Keywords Abstract Paper

On the Estimation Bias in Double Q-Learning

Zhizhou Ren, Guangxiang Zhu, Hao Hu and Beining Han, Jianglun Chen, Chongjie Zhang

Keywords Abstract Paper

Bayesian Distributional Policy Gradients

Luchen Li, A. Aldo Faisal

Keywords Abstract Paper

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Zeke Xie, Li Yuan, Zhanxing Zhu, Masashi Sugiyama

Keywords Abstract Paper

Optimization, Stochastic Optimization

Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Fan Zhou, Jianing Wang, Xingdong Feng

Keywords Abstract Paper

Fourier Sparse Leverage Scores and Approximate Kernel Learning

Tamas Erdelyi, Cameron Musco, Christopher Musco

Keywords Abstract Paper

Stochastic Normalizing Flows

Hao Wu, Jonas Köhler, Frank Noe

Keywords Abstract Paper

Learning Near Optimal Policies with Low Inherent Bellman Error

Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill

Keywords Abstract Paper

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Jian Li, Xuanyuan Luo, Mingda Qiao

Keywords Abstract Paper

learning theory, generalization, nonconvex learning, stochastic gradient descent, Langevin dynamics

Slice Sampling Reparameterization Gradients

David M Zoltowski, Diana Cai, Ryan Adams

Keywords Abstract Paper

optimization, machine learning, generative model

Kernel Conditional Density Operators

Ingmar Schuster, Mattes Mollenhauer, Stefan Klus, Krikamol Muandet

Keywords Abstract Paper

Bellman-consistent Pessimism for Offline Reinforcement Learning

Tengyang Xie, Ching-An Cheng, Nan Jiang and Paul Mineiro, Alekh Agarwal

Keywords Abstract Paper

theory, reinforcement learning and planning, robustness

Efficient Statistical Tests: A Neural Tangent Kernel Approach

Sheng Jia, Ehsan Nezhadarya, Yuhuai Wu, Jimmy Ba

Keywords Abstract Paper

Bessel Smoothing and Multi-Distribution Property Estimation

Yi Hao, Ping Li

Keywords Abstract Paper

Distribution learning/testing, High-dimensional statistics, Information theory

Conservative Offline Distributional Reinforcement Learning

Yecheng Ma, Dinesh Jayaraman, Osbert Bastani

Keywords Abstract Paper

Robust learning under strong noise via SQs

Ioannis Anagnostides, Themis Gouleakis, Ali Marashian

Keywords Abstract Paper

Infinite Gaussian Mixture Modeling with an Improved Estimation of the Number of Clusters

Avi Matza, Yuval Bistritz

Keywords Abstract Paper

Modeling the Second Player in Distributionally Robust Optimization

Paul Michel, Tatsunori Hashimoto, Graham Neubig

Keywords Abstract Paper

adversarial learning, deep learning, robustness, distributionally robust optimization

Non-asymptotic Error Bounds for Bidirectional GANs

Shiao Liu, Yunfei Yang, Jian Huang and Yuling Jiao, Yang Wang

Keywords Abstract Paper

deep learning, generative model

GMAC: A Distributional Perspective on Actor-Critic Framework

Daniel Nam, Younghoon Kim, Chan Park

Keywords Abstract Paper

Reinforcement Learning and Planning, Deep RL

Optimal Statistical Guaratees for Adversarially Robust Gaussian Classification

Keywords Paper

Keywords Paper

Keywords Paper

Zhizhou Ren, Guangxiang Zhu, Hao Hu and
Beining Han, Jianglun Chen, Chongjie Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tengyang Xie, Ching-An Cheng, Nan Jiang and
Paul Mineiro, Alekh Agarwal

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Shiao Liu, Yunfei Yang, Jian Huang and
Yuling Jiao, Yang Wang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Warren Morningstar, Sharad Vikram, Cusuh Ham and
Andrew Gallagher, Joshua Dillon

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Moonkyung Ryu, Yinlam Chow, Ross Anderson and
Christian Tjandraatmadja, Craig Boutilier

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper