Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation

Abstract: In recent years, there are great interests as well as many challenges in applying reinforcement learning (RL) to recommendation systems (RS). In this paper, we summarize three key practical challenges of large-scale RL-based recommender systems: massive state and action spaces, high-variance environment, and the unspecific reward setting in recommendation. All these problems remain largely unexplored in the existing literature and make the application of RL challenging. We develop a model-based reinforcement learning framework, called GoalRec. Inspired by the ideas of world model (model-based), value function estimation (model-free), and goal-based RL, a novel disentangled universal value function designed for item recommendation is proposed. It can generalize to various goals that the recommender may have, and disentangle the stochastic environmental dynamics and high-variance reward signals accordingly. As a part of the value function, free from the sparse and high-variance reward signals, a high-capacity reward-independent world model is trained to simulate complex environmental dynamics under a certain goal. Based on the predicted environmental dynamics, the disentangled universal value function is related to the user's future trajectory instead of a monolithic state and a scalar reward. We demonstrate the superiority of GoalRec over previous approaches in terms of the above three practical challenges in a series of simulations and a real application.

06/12/2020

Tianpei Yang, Weixun Wang, Hongyao Tang and
Jianye Hao, Zhaopeng Meng, Hangyu Mao, Dong Li, Wulong Liu, Yingfeng Chen, Yujing Hu, Changjie Fan, Chengwei Zhang

Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation

Kai Wang, Zhene Zou, Qilin Deng, Jianrong Tao, Runze Wu, Changjie Fan, Liang Chen, Peng Cui

Comments

Similar Papers

Learning Guidance Rewards with Trajectory-space Smoothing

Tanmay Gangwani, Yuan Zhou, Jian Peng

Keywords Abstract Paper

Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Hongyao Tang, Zhaopeng Meng, Guangyong Chen and Pengfei Chen, Chen Chen, Yaodong Yang, Luo Zhang, Wulong Liu, Jianye Hao

Keywords Abstract Paper

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Yujing Hu, Weixun Wang, Hangtian Jia and Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Keywords Abstract Paper

Outcome-Driven Reinforcement Learning via Variational Inference

Tim G. J. Rudner, Vitchyr Pong, Rowan McAllister and Yarin Gal, Sergey Levine

Keywords Abstract Paper

reinforcement learning and planning, generative model

Risk-Aware Transfer in Reinforcement Learning using Successor Features

Michael Gimelfarb, Andre Barreto, Scott Sanner, Chi-Guhn Lee

Keywords Abstract Paper

reinforcement learning and planning, representation learning, transfer learning

Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

Tianyu Li, Bogdan Mazoure, Doina Precup, Guillaume Rabusseau

Keywords Abstract Paper

The Value Equivalence Principle for Model-Based Reinforcement Learning

Christopher Grimm, Andre Barreto, Satinder Singh, David Silver

Keywords Abstract Paper

Towards Robust Bisimulation Metric Learning

Mete Kemertas, Tristan Aumentado-Armstrong

Keywords Abstract Paper

reinforcement learning and planning, robustness, representation learning

Goal-directed Generation of Discrete Structures with Conditional Generative Models

Amina Mollaysa, Brooks Paige, Alexandros Kalousis

Keywords Abstract Paper

CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee

Tengyu Xu, Yingbin LIANG, Guanghui Lan

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Aaron Sonabend, Junwei Lu, Leo Anthony Celi and Tianxi Cai, Peter Szolovits

Keywords Abstract Paper

Local policy search with Bayesian optimization

Sarah Müller, Alexander von Rohr, Sebastian Trimpe

Keywords Abstract Paper

theory, optimization, reinforcement learning and planning, active learning

Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning

Dexter R.R. Scobee, S. Shankar Sastry

Keywords Abstract Paper

learning from demonstration, inverse reinforcement learning, constraint inference

An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning

Tianpei Yang, Weixun Wang, Hongyao Tang and Jianye Hao, Zhaopeng Meng, Hangyu Mao, Dong Li, Wulong Liu, Yingfeng Chen, Yujing Hu, Changjie Fan, Chengwei Zhang

Keywords Abstract Paper

reinforcement learning and planning, transfer learning

Explicable Reward Design for Reinforcement Learning Agents

Rati Devidze, Goran Radanovic, Parameswaran Kamalaruban, Adish Singla

Keywords Abstract Paper

optimization, reinforcement learning and planning, interpretability

Near Optimal Reward-Free Reinforcement Learning

Zhang Zihan, Simon Du, Xiangyang Ji

Keywords Abstract Paper

Theory, RL, Decisions and Control Theory

Deep active inference agents using Monte-Carlo methods

Zafeirios Fountas, Noor Sajid, Pedro Mediano, Karl Friston

Keywords Abstract Paper

KERL: A knowledge-guided reinforcement learning model for sequential recommendation

Pengfei Wang, Yu Fan, Long Xia and Wayne Xin Zhao, Shaozhang Niu, Jimmy Huang

Keywords Abstract Paper

sequential recommendation, reinforcement learning, knowledge graph

Single Episode Policy Transfer in Reinforcement Learning

Jiachen Yang, Brenden Petersen, Hongyuan Zha, Daniel Faissol

Keywords Abstract Paper

transfer learning, reinforcement learning

Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Ruiyi Zhang, Changyou Chen, Zhe Gan and Zheng Wen, Wenlin Wang, Lawrence Carin

Keywords Abstract Paper

FISSA: Fusing item similarity models with self-attention networks for sequential recommendation

Jing Lin, Weike Pan, Zhong Ming

Keywords Abstract Paper

Item Similarity Models, Sequential Recommendation, Gating Networks, Self-Attention

MOReL: Model-Based Offline Reinforcement Learning

Keywords Paper

Hongyao Tang, Zhaopeng Meng, Guangyong Chen and
Pengfei Chen, Chen Chen, Yaodong Yang, Luo Zhang, Wulong Liu, Jianye Hao

Keywords Paper

Yujing Hu, Weixun Wang, Hangtian Jia and
Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

Keywords Paper

Tim G. J. Rudner, Vitchyr Pong, Rowan McAllister and
Yarin Gal, Sergey Levine

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Aaron Sonabend, Junwei Lu, Leo Anthony Celi and
Tianxi Cai, Peter Szolovits

Keywords Paper

Keywords Paper

Keywords Paper

Tianpei Yang, Weixun Wang, Hongyao Tang and
Jianye Hao, Zhaopeng Meng, Hangyu Mao, Dong Li, Wulong Liu, Yingfeng Chen, Yujing Hu, Changjie Fan, Chengwei Zhang

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Pengfei Wang, Yu Fan, Long Xia and
Wayne Xin Zhao, Shaozhang Niu, Jimmy Huang

Keywords Paper

Keywords Paper

Ruiyi Zhang, Changyou Chen, Zhe Gan and
Zheng Wen, Wenlin Wang, Lawrence Carin

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Keywords Paper

Ruiyang Ren, Zhaoyang Liu, Yaliang Li and
Wayne Xin Zhao, Hui Wang, Bolin Ding, Ji-Rong Wen

Keywords Paper

Kevin Yang, Tianjun Zhang, Chris Cummins and
Brandon Cui, Benoit Steiner, Linnan Wang, Joseph Gonzalez, Dan Klein, Yuandong Tian

Keywords Paper

Xiangyu Zhao, Changsheng Gu, Haoshenglun Zhang and
Xiwang Yang, Xiaobing Liu, Jiliang Tang, Hui Liu

Keywords Paper

Haisheng Su, Weihao Gan, Wei Wu and
Yu Qiao, Junjie Yan

Keywords Paper

Alex Turner, Logan Smith, Rohin Shah and
Andrew Critch, Prasad Tadepalli

Keywords Paper

Ben Eysenbach, Shreyas Chaudhari, Swapnil Asawa and
Sergey Levine, Ruslan Salakhutdinov

Keywords Paper

Keywords Paper

Keywords Paper

Anuj Mahajan, Mikayel Samvelyan, Lei Mao and
Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Anima Anandkumar

Keywords Paper

Victoria Krakovna, Laurent Orseau, Richard Ngo and
Miljan Martic, Shane Legg

Keywords Paper

Keywords Paper