Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator

03/05/2021

Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator

Max B Paulus, Chris Maddison, Andreas Krause

Keywords: softmax, gumbel, rao-blackwell, rao, straightthrough, straight-through, gumbel-softmax

Abstract Paper Similar Papers

Abstract: Gradient estimation in models with discrete latent variables is a challenging problem, because the simplest unbiased estimators tend to have high variance. To counteract this, modern estimators either introduce bias, rely on multiple function evaluations, or use learned, input-dependent baselines. Thus, there is a need for estimators that require minimal tuning, are computationally cheap, and have low mean squared error. In this paper, we show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization without increasing the number of function evaluations. This provably reduces the mean squared error. We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

06/12/2021

Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization

Jialun Zhang, Salar Fattahi, Richard Y Zhang

Keywords Paper

optimization

0

0

0

0

8:36

18/07/2021

High-Dimensional Gaussian Process Inference with Derivatives

Filip de Roos, Alexandra Gessner, Philipp Hennig

Keywords Paper

Probabilistic Methods, Gaussian Processes and Bayesian non-parametrics

0

0

0

0

4:10

06/12/2020

Variance reduction for Random Coordinate Descent-Langevin Monte Carlo

ZHIYAN DING, Qin Li

Keywords Paper

0

0

0

0

3:24

13/04/2021

Direct loss minimization for sparse gaussian processes

Yadi Wei, Rishit Sheth, Roni Khardon

Keywords Paper

0

0

0

0

3:24

06/12/2020

Noise-Contrastive Estimation for Multivariate Point Processes

Hongyuan Mei, Tom Wan, Jason Eisner

Keywords Paper

0

0

0

0

3:20

14/06/2020

A Graduated Filter Method for Large Scale Robust Estimation

Huu Le, Christopher Zach

Keywords Paper

robust fitting, bundle adjustment, non-convex, poor local minima, non-linear least squares, graduated non-convexity.

0

0

0

0

1:01

18/07/2021

Tighter Bounds on the Log Marginal Likelihood of Gaussian Process Regression Using Conjugate Gradients

Artem Artemev, David Burt, Mark van der Wilk

Keywords Paper

Probabilistic Methods, Gaussian Processes and Bayesian non-parametrics

0

0

0

0

17:13

26/04/2020

Extreme Classification via Adversarial Softmax Approximation

Robert Bamler, Stephan Mandt

Keywords Paper

Extreme classification, negative sampling

0

0

0

0

5:04

06/12/2020

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Dheeraj Nagaraj, Xian Wu, Guy Bresler and
Prateek Jain, Praneeth Netrapalli

Keywords Paper

0

0

0

0

3:34

04/08/2021

Concentration of Non-Isotropic Random Tensors with Applications to Learning and Empirical Risk Minimization

Mathieu Even, Laurent Massoulie

Keywords Paper

0

0

0

0

18:00

06/12/2021

Heavy Ball Neural Ordinary Differential Equations

Hedi Xia, Vai Suliafu, Hangjie Ji and
Tan Nguyen, Andrea Bertozzi, Stanley Osher, Bao Wang

Keywords Paper

deep learning, optimization, machine learning, vision

0

0

0

0

4:08

09/07/2020

On the Convergence of Stochastic Gradient Descent with Low-Rank Projections for Convex Low-Rank Matrix Problems

Dan Garber

Keywords Paper

Convex optimization, Online learning, Stochastic optimization

0

0

0

0

9:38

03/05/2021

Implicit Gradient Regularization

David Barrett, Benoit Dherin

Keywords Paper

regularization, theory, deep learning, implicit regularization, deep learning theory, theoretical issues in deep learning

0

0

0

0

4:55

06/12/2020

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

Edward Moroshko, Blake Woodworth, Suriya Gunasekar and
Jason Lee, Nati Srebro, Daniel Soudry

Keywords Paper

0

0

0

0

3:19

06/12/2020

Truncated Linear Regression in High Dimensions

Constantinos Daskalakis, Dhruv Rohatgi, Emmanouil Zampetakis

Keywords Paper

0

0

0

0

3:17

26/08/2020

AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC

Ruqi Zhang, A. Feder Cooper, Christopher De Sa

Keywords Paper

0

0

0

0

16:26

20/07/2020

DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM

Bao Wang, Quanquan Gu, March Boedihardjo and
Lingxiao Wang, Farzin Barekat, Stanley J. Osher

Keywords Paper

0

0

0

0

17:42

06/12/2021

Spatio-Temporal Variational Gaussian Processes

Oliver Hamelijnck, William Wilkinson, Niki Loppi and
Arno Solin, Theodoros Damoulas

Keywords Paper

generative model, kernel methods

0

0

0

0

6:04

03/05/2021

Gradient Origin Networks

Sam Bond-Taylor, Chris G Willcocks

Keywords Paper

Implicit Representation, Generative Models, Deep Learning

0

0

0

0

5:01

12/07/2020

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

Alexander Shevchenko, Marco Mondelli

Keywords Paper

Deep Learning - Theory

0

0

0

0

13:20

06/12/2021

Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery

Lijun Ding, Liwei Jiang, Yudong Chen and
Qing Qu, Zhihui Zhu

Keywords Paper

0

0

0

0

14:02

06/12/2020

Optimal Variance Control of the Score-Function Gradient Estimator for Importance-Weighted Bounds

Valentin Liévin, Andrea Dittadi, Anders Christensen, Ole Winther

Keywords Paper

0

0

0

0

3:06

06/12/2021

On Density Estimation with Diffusion Models

Diederik Kingma, Tim Salimans, Ben Poole, Jonathan Ho

Keywords Paper

optimization, generative model

0

0

0

0

9:53

18/07/2021

Self Normalizing Flows

T. Anderson Keller, Jorn Peters, Priyank Jaini and
Emiel Hoogeboom, Patrick Forré, Max Welling

Keywords Paper

Deep Learning, Generative Models

0

1

1

0

4:24

06/12/2020

A novel variational form of the Schatten-$p$ quasi-norm

Paris Giampouras, Rene Vidal, Athanasios Rontogiannis, Benjamin Haeffele

Keywords Paper

0

0

0

0

3:14

12/07/2020

Low Bias Low Variance Gradient Estimates for Hierarchical Boolean Stochastic Networks

Adeel Pervez, Taco Cohen, Efstratios Gavves

Keywords Paper

Deep Learning - Generative Models and Autoencoders

0

0

0

0

14:28

12/07/2020

Momentum Improves Normalized SGD

Ashok Cutkosky, Harsh Mehta

Keywords Paper

Optimization - Non-convex

0

0

0

0

16:11

12/07/2020

The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks

Jakub Swiatkowski, Kevin Roth, Bastiaan Veeling and
Linh Tran, Joshua Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin

Keywords Paper

Deep Learning - General

0

0

0

0

13:13

26/08/2020

A Unified Statistically Efficient Estimation Framework for Unnormalized Models

Masatoshi Uehara, Takafumi Kanamori, Takashi Takenouchi, Takeru Matsuda

Keywords Paper

0

0

0

0

13:58

13/04/2021

Localizing changes in high-dimensional regression models

Alessandro Rinaldo, Daren Wang, Qin Wen and
Rebecca Willett, Yi Yu

Keywords Paper

0

0

0

0

3:00

06/12/2020

Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

Raef Bassily, Vitaly Feldman, Cristóbal Guzmán, Kunal Talwar

Keywords Paper

0

0

0

0

3:11

22/11/2021

SLURP: Side Learning Uncertainty for Regression Problems

Xuanlong Yu, Gianni Franchi, Emanuel Aldea

Keywords Paper

Uncertainty estimation, Confidence estimation, Auxiliary model, Monocular depth, Optical flow

0

0

0

0

3:03

18/07/2021

Bias-Free Scalable Gaussian Processes via Randomized Truncations

Andres Potapczynski, Luhuan Wu, Dan Biderman and
Geoff Pleiss, John Cunningham

Keywords Paper

Probabilistic Methods, Gaussian Processes and Bayesian non-parametrics

0

0

0

0

4:58

06/12/2021

Dual Parameterization of Sparse Variational Gaussian Processes

Vincent ADAM, Paul Chang, Mohammad Emtiyaz Khan, Arno Solin

Keywords Paper

optimization, generative model, kernel methods

0

0

0

0

13:29

26/08/2020

Integrals over Gaussians under Linear Domain Constraints

Alexandra Gessner, Oindrila Kanjilal, Philipp Hennig

Keywords Paper

0

0

0

0

13:51

26/08/2020

Gaussian-Smoothed Optimal Transport: Metric Structure and Statistical Efficiency

Ziv Goldfeld, Kristjan Greenewald

Keywords Paper

0

0

0

0

14:45

18/07/2021

Marginalized Stochastic Natural Gradients for Black-Box Variational Inference

Geng Ji, Debora Sujono, Erik Sudderth

Keywords Paper

Probabilistic Methods, Approximate Inference

0

0

0

0

12:10

06/12/2021

Interpolation can hurt robust generalization even when there is no noise

Konstantin Donhauser, Alexandru Tifrea, Michael Aerni and
Reinhard Heckel, Fanny Yang

Keywords Paper

theory, machine learning, robustness

0

0

0

0

9:40

26/08/2020

One Sample Stochastic Frank-Wolfe

Mingrui Zhang, Zebang Shen, Aryan Mokhtari and
Hamed Hassani, Amin Karbasi

Keywords Paper

0

0

0

0

6:05

05/01/2021

Covariance-Free Partial Least Squares: An Incremental Dimensionality Reduction Method

Artur Jordao, Maiko Lie, Victor Hugo Cunha de Melo, William Robson Schwartz

Keywords Paper

0

0

0

0

4:07