Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Abstract: Reinforcement learning (RL) has been widely studied for improving sequence-generation models. However, the conventional rewards used for RL training typically cannot capture sufficient semantic information and therefore render model bias. Further, the sparse and delayed rewards make RL exploration inefficient. To alleviate these issues, we propose the concept of nested-Wasserstein distance for distributional semantic matching. To further exploit it, a novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences for enhanced exploration and better semantic matching. Our solution can be understood as approximately executing proximal policy optimization with Wasserstein trust-regions. Experiments on a variety of unconditional and conditional sequence-generation tasks demonstrate the proposed approach consistently leads to improved performance.

06/12/2021

reinforcement learning, Wasserstein distance, Frobenius norm, Kullback-Leibler divergence, trust region, policy gradient, projection

5:13

02/02/2021

Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Ruiyi Zhang, Changyou Chen, Zhe Gan, Zheng Wen, Wenlin Wang, Lawrence Carin

Comments

Similar Papers

Robust Deep Reinforcement Learning through Adversarial Loss

Tuomas Oikarinen, Wang Zhang, Alexandre Megretski and Luca Daniel, Tsui-Wei Weng

Keywords Abstract Paper

reinforcement learning and planning, robustness, adversarial robustness and security

Active deep Q-learning with demonstration

Si-An Chen,Hsuan-Tien Lin, Voot Tangkaratt, Masashi Sugiyam

Keywords Abstract Paper

Exploring supervised and unsupervised rewards in machine translation

Julia Ive, Zixu Wang, Marina Fomicheva, Lucia Specia

Keywords Abstract Paper

GaussianPath:A Bayesian Multi-Hop Reasoning Framework for Knowledge Graph Reasoning

Guojia Wan, Bo Du

Keywords Abstract Paper

MOPO: Model-based Offline Policy Optimization

Tianhe (Kevin) Yu, Garrett Thomas, Lantao Yu and Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Keywords Abstract Paper

Policy Learning Using Weak Supervision

Jingkang Wang, Hongyi Guo, Zhaowei Zhu, Yang Liu

Keywords Abstract Paper

reinforcement learning and planning

Differentiable Trust Region Layers for Deep Reinforcement Learning

Fabian Otto, Philipp Becker, Vien A Ngo and Hanna Ziesche, Gerhard Neumann

Keywords Abstract Paper

reinforcement learning, Wasserstein distance, Frobenius norm, Kullback-Leibler divergence, trust region, policy gradient, projection

Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Haiyan Yin, Jianda Chen, Sinno Jialin Pan, Sebastian Tschiatschek

Keywords Abstract Paper

Reward-Constrained Behavior Cloning

Zhaorong Wang, Meng Wang, Jingqi Zhang and Yingfeng Chen, Chongjie Zhang

Keywords Abstract Paper

Machine Learning, Deep Reinforcement Learning, Reinforcement Learning, Constraint Optimization

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations

Huan Zhang, Hongge Chen, Chaowei Xiao and Bo Li, Mingyan Liu, Duane Boning, Cho-Jui Hsieh

Keywords Abstract Paper

Optimistic Exploration even with a Pessimistic Initialisation

Tabish Rashid, Bei Peng, Wendelin Boehmer, Shimon Whiteson

Keywords Abstract Paper

Reinforcement Learning, Exploration, Optimistic Initialisation

When Is Generalizable Reinforcement Learning Tractable?

Dhruv Malik, Yuanzhi Li, Pradeep Ravikumar

Keywords Abstract Paper

reinforcement learning and planning, generative model, representation learning

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Sebastian Curi, Felix Berkenkamp, Andreas Krause

Keywords Abstract Paper

Representations for Stable Off-Policy Reinforcement Learning

Dibya Ghosh, Marc Bellemare

Keywords Abstract Paper

Reinforcement Learning - General

Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning

Sebastian Curi, Ilija Bogunovic, Andreas Krause

Keywords Abstract Paper

Reinforcement Learning and Planning

Selective Pseudo-Labeling with Reinforcement Learning for Semi-Supervised Domain Adaptation

Bingyu Liu, Yuhong Guo, Jieping Ye, Weihong Deng

Keywords Abstract Paper

semi-supervised domain adaptation, reinforcement learning, pseudo-label

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Keywords Abstract Paper

Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration

Yu Wang, Jingyang Lin, Jingjing Zou and Yingwei Pan, Ting Yao, Tao Mei

Keywords Abstract Paper

self-supervised learning, contrastive learning

Adaptive Prior-Dependent Correction Enhanced Reinforcement Learning for Natural Language Generation

Wei Cheng, Ziyan Luo, Qiyue Yin

Keywords Abstract Paper

Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration

Lulu Zheng, Jiarui Chen, Jianhao Wang and Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang

Keywords Abstract Paper

reinforcement learning and planning

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

Siddharth Reddy, Anca D. Dragan, Sergey Levine

Keywords Abstract Paper

Imitation Learning, Reinforcement Learning

Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation

Tuomas Oikarinen, Wang Zhang, Alexandre Megretski and
Luca Daniel, Tsui-Wei Weng

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Tianhe (Kevin) Yu, Garrett Thomas, Lantao Yu and
Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Keywords Paper

Keywords Paper

Fabian Otto, Philipp Becker, Vien A Ngo and
Hanna Ziesche, Gerhard Neumann

Keywords Paper

Keywords Paper

Zhaorong Wang, Meng Wang, Jingqi Zhang and
Yingfeng Chen, Chongjie Zhang

Keywords Paper

Huan Zhang, Hongge Chen, Chaowei Xiao and
Bo Li, Mingyan Liu, Duane Boning, Cho-Jui Hsieh

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Yu Wang, Jingyang Lin, Jingjing Zou and
Yingwei Pan, Ting Yao, Tao Mei

Keywords Paper

Keywords Paper

Lulu Zheng, Jiarui Chen, Jianhao Wang and
Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar and
Weihao Kong, Emma Brunskill

Keywords Paper

Roberta Raileanu, Maxwell Goldstein, Denis Yarats and
Ilya Kostrikov, Rob Fergus

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and
Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Paper

Tonghan Wang, Jianhao Wang, Chongyi Zheng, Chongjie Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Kevin Li, Abhishek Gupta, Ashwin D Reddy and
Vitchyr Pong, Aurick Zhou, Justin Yu, Sergey Levine

Keywords Paper

Yuqian Jiang, Suda Bharadwaj, Bo Wu and
Rishi Shah, Ufuk Topcu, Peter Stone

Keywords Paper

Keywords Paper