WaveGrad: Estimating Gradients for Waveform Generation

03/05/2021

WaveGrad: Estimating Gradients for Waveform Generation

Nanxin Chen, Yu Zhang, Heiga Zen, Ron Weiss, Mohammad Norouzi, William Chan

Keywords: gradient estimation, waveform generation, score matching, vocoder, diffusion, text-to-speech

Abstract Paper Similar Papers

Abstract: This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density. The model is built on prior work on score matching and diffusion probabilistic models. It starts from a Gaussian white noise signal and iteratively refines the signal via a gradient-based sampler conditioned on the mel-spectrogram. WaveGrad offers a natural way to trade inference speed for sample quality by adjusting the number of refinement steps, and bridges the gap between non-autoregressive and autoregressive models in terms of audio quality. We find that it can generate high fidelity audio samples using as few as six iterations. Experiments reveal WaveGrad to generate high fidelity audio, outperforming adversarial non-autoregressive baselines and matching a strong likelihood-based autoregressive baseline using fewer sequential operations. Audio samples are available at https://wavegrad.github.io/.

0

0

0

0

Share

This is an embedded video. Talk and the respective paper are published at ICLR 2021 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment

no comments yet

Similar Papers

02/11/2020

On multitask loss function for audio event detection and localization

Huy Phan, Lam Pham, Philipp Koch and
Ngoc Q. K. Duong, Ian McLoughlin, Alfred Mertins

Keywords Paper

0

0

0

0

15:16

03/05/2021

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Durk Kingma and
Abhishek Kumar, Stefano Ermon, Ben Poole

Keywords Paper

score matching, stochastic differential equations, score-based generative models, diffusion, generative models

0

0

0

0

15:27

02/11/2020

Ensemble of sequence matching networks for dynamic sound event localization, detection, and tracking

Thi Ngoc Tho Nguyen, Douglas L. Jones, Woon Seng Gan

Keywords Paper

0

0

0

0

11:06

02/11/2020

On the effectiveness of spatial and multi-channel features for multi-channel polyphonic sound event detection

Thi Ngoc Tho Nguyen, Douglas L. Jones, Woon Seng Gan

Keywords Paper

0

0

0

0

12:29

03/05/2021

DiffWave: A Versatile Diffusion Model for Audio Synthesis

Zhifeng Kong, Wei Ping, Jiaji Huang and
Kexin Zhao, Bryan Catanzaro

Keywords Paper

diffusion probabilistic models, generative models, speech synthesis, audio synthesis

0

0

0

0

15:12

02/11/2020

Sound event localization and detection based on CRNN using rectangular filters and channel rotation data augmentation

Francesca Ronchini, Daniel Arteaga, Andrés Pérez-López

Keywords Paper

0

0

0

0

12:51

03/05/2021

Neural Synthesis of Binaural Speech From Mono Audio

Alexander Richard, Dejan Markovic, Israel Gebru and
Steven Krenn, Gladstone A Butler, Fernando Torre, Yaser Sheikh

Keywords Paper

speech generation, speech processing, binaural speech, neural sound synthesis, sound spatialization, binaural audio

0

0

0

0

15:00

18/07/2021

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

Vadim Popov, Ivan Vovk, Vladimir Gogoryan and
Tasnima Sadekova, Mikhail Kudinov

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

5:12

13/04/2021

Variational inference for nonlinear ordinary differential equations

Sanmitra Ghosh, Paul Birrell, Daniela De Angelis

Keywords Paper

0

0

0

0

3:05

18/07/2021

Detection of Signal in the Spiked Rectangular Models

Ji Hyung Jung, Hye Won Chung, Ji Oon Lee

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:06

22/11/2021

Taming Visually Guided Sound Generation

Vladimir Iashin, Esa Rahtu

Keywords Paper

multi-modal learning, audio generation, video understanding, transformer, VQVAE, MelGAN, perceptual loss, generation metrics, VGGSound, VAS

0

0

0

0

9:54

06/12/2021

On Density Estimation with Diffusion Models

Diederik Kingma, Tim Salimans, Ben Poole, Jonathan Ho

Keywords Paper

optimization, generative model

0

0

0

0

9:53

03/05/2021

Learning with Feature-Dependent Label Noise: A Progressive Approach

Yikai Zhang, Songzhu Zheng, Pengxiang Wu and
Mayank Goswami, Chao Chen

Keywords Paper

Noisy Label, Classification, Deep Learning

0

0

0

0

10:37

03/05/2021

Learning Energy-Based Models by Diffusion Recovery Likelihood

Ruiqi Gao, Yang Song, Ben Poole and
Yingnian Wu, Durk Kingma

Keywords Paper

recovery likelihood, EBM, energy-based model, generative model, HMC, Langevin dynamics, MCMC, diffusion process

0

0

0

0

6:03

06/12/2021

Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

Erik Englesson, Hossein Azizpour

Keywords Paper

0

0

0

0

9:42

18/07/2021

SoundDet: Polyphonic Moving Sound Event Detection and Localization from Raw Waveform

Yuhang He, Niki Trigoni, Andrew Markham

Keywords Paper

Applications, Audio and Speech Processing

0

0

0

0

4:34

11/10/2020

Music Structure Analysis Based on an LSTM-HSMM Hybrid Model

Go Shibata, Ryo Nishikimi, Kazuyoshi Yoshii

Keywords Paper

Musical features and properties, Structure, segmentation, and form

0

0

0

0

4:06

02/02/2021

Interactive Speech and Noise Modeling for Speech Enhancement

Chengyu Zheng, Xiulian Peng, Yuan Zhang and
Sriram Srinivasan, Yan Lu

Keywords Paper

0

0

0

0

14:47

02/11/2020

DCASE 2020 Task2: Anomalous sound detection using relevant spectral feature and focusing techniques in the unsupervised learning scenario

Jihwan Park, Sooyeon Yoo

Keywords Paper

0

0

0

0

11:06

02/11/2020

Temporal sub-sampling of audio feature sequences for automated audio captioning

Khoa Nguyen, Konstantinos Drossos, Tuomas Virtanen

Keywords Paper

0

0

0

0

14:09

26/04/2020

High Fidelity Speech Synthesis with Adversarial Networks

Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman and
Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan

Keywords Paper

texttospeech, speechsynthesis, audiosynthesis, gans, generativeadversarialnetworks, implicitgenerativemodels

0

0

0

0

15:07

06/12/2021

Estimating High Order Gradients of the Data Distribution by Denoising

Chenlin Meng, Yang Song, Wenzhe Li, Stefano Ermon

Keywords Paper

generative model

0

0

0

0

7:31

02/11/2020

Self-supervised classification for detecting anomalous sounds

Ritwik Giri, Srikanth V. Tenneti, Fangzhou Cheng and
Karim Helwani, Umut Isik, Arvindh Krishnaswamy

Keywords Paper

0

0

0

0

13:28

02/11/2020

Two-stage domain adaptation for sound event detection

Liping Yang, Junyong Hao, Zhenwei Hou, Wang Peng

Keywords Paper

0

0

0

0

13:16

06/12/2021

A Variational Perspective on Diffusion-Based Generative Models and Score Matching

Chin-Wei Huang, Jae Hyun Lim, Aaron Courville

Keywords Paper

deep learning, generative model

0

0

0

0

15:37

11/10/2020

Hierarchical Timbre-painting and Articulation Generation

Michael M Michelashvili, Lior Wolf

Keywords Paper

Domain knowledge, Machine learning/Artificial intelligence for music, Representations of music, MIR fundamentals and methodology, Music signal processing, MIR tasks, Music synthesis and transformation

0

0

0

0

4:04

11/10/2020

Content Based Singing Voice Source Separation via Strong Conditioning Using Aligned Phonemes

Gabriel Meseguer Brocal, Geoffroy Peeters

Keywords Paper

MIR tasks, Sound source separation, Evaluation, datasets, and reproducibility, Novel datasets and use cases, MIR fundamentals and methodology, Lyrics and other textual data, web mining, and natural language processing, Multimodality

0

0

0

0

4:08

18/07/2021

Parallel and Flexible Sampling from Autoregressive Models via Langevin Dynamics

Vivek Jayaram, John Thickstun

Keywords Paper

Optimization, Convex Optimization, Deep Learning, Generative Models, Algorithms, Large Scale Learning; Algorithms, Regression; Algorithms, Sparsity and Compressed Sensing; Algorithms, Stru

0

0

0

0

5:17

18/07/2021

Optimal Estimation of High Dimensional Smooth Additive Function Based on Noisy Observations

Fan Zhou, Ping Li

Keywords Paper

Theory, Statistical Learning Theory

0

0

0

0

5:47

06/12/2021

Improved Regularization and Robustness for Fine-tuning in Neural Networks

Dongyue Li, Hongyang Zhang

Keywords Paper

deep learning, machine learning, robustness, vision, transfer learning

0

0

0

0

12:03

26/04/2020

From Variational to Deterministic Autoencoders

Partha Ghosh, Mehdi S. M. Sajjadi, Antonio Vergari and
Michael Black, Bernhard Scholkopf

Keywords Paper

Unsupervised learning, Generative Models, Variational Autoencoders, Regularization

0

0

0

0

4:59

02/11/2020

Conformer-based sound event detection with semi-supervised learning and data augmentation

Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi and
Shinji Watanabe, Tomoki Toda, Kazuya Takeda

Keywords Paper

0

0

0

0

14:29

03/05/2021

Adversarial score matching and improved sampling for image generation

Alexia Jolicoeur-Martineau, Rémi Piché-Taillefer, Ioannis Mitliagkas, Remi Combes

Keywords Paper

score matching, adversarial, generative model, GAN, Langevin dynamics

0

0

0

0

4:56

02/02/2021

Modeling the Compatibility of Stem Tracks to Generate Music Mashups

Jiawen Huang, Ju-Chiang Wang, Jordan B. L. Smith and
Xuchen Song, Yuxuan Wang

Keywords Paper

0

0

0

0

19:31

06/12/2021

PCA Initialization for Approximate Message Passing in Rotationally Invariant Models

Marco Mondelli, Ramji Venkataramanan

Keywords Paper

theory

0

0

0

0

13:57

14/06/2020

Discriminative Multi-Modality Speech Recognition

Bo Xu, Cheng Lu, Yandong Guo, Jacob Wang

Keywords Paper

multi-modal, audio-visual, speech recognition, lip reading, deep learning, eleatt-gru, deep learning

0

0

0

0

1:01

11/10/2020

Generating Music with a Self-correcting Non-chronological Autoregressive Model

Wayne Chi, Prachi Kumar, Suri Yaddanapudi and
Suresh Rahul, Umut Isik

Keywords Paper

Domain knowledge, Machine learning/Artificial intelligence for music, Applications, Music composition, performance, and production, Representations of music, MIR tasks, Music synthesis and transformation

0

0

0

0

4:33

06/12/2021

SNIPS: Solving Noisy Inverse Problems Stochastically

Bahjat Kawar, Gregory Vaksman, Michael Elad

Keywords Paper

0

0

0

0

12:27

26/04/2020

Deep Audio Priors Emerge From Harmonic Convolutional Networks

Zhoutong Zhang, Yunyun Wang, Chuang Gan and
Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

Keywords Paper

Audio, Deep Prior

0

0

0

0

5:13

02/11/2020

Searching for efficient network architectures for acoustic scene classification

Yuzhong Wu, Tan Lee

Keywords Paper

0

0

0

0

14:37