Contextual meta-bandit for recommender systems selection

Abstract: Recommendation systems operate in a highly stochastic and non-stationary environment. As the amount of user-specific information varies, the users’ interests themselves also change. This combination creates a dynamic setting where a single solution will rarely be optimal unless it can keep up with these transformations. One system may perform better than others depending on the situation at hand, thus making the choice of which system to deploy, even more difficult. We address these problems by using the Hierarchical Reinforcement Learning framework. Our proposed meta-bandit acts as a policy over options, where each option maps to a pre-trained, independent recommender system. This meta-bandit learns online and selects a recommender accordingly to the context, adjusting to the situation. We conducted experiments on real data and found that our approach manages to address the dynamics within the user’s changing interests. We also show that it outperforms any of the recommenders separately, as well as an ensemble of them.

22/09/2020

Vivek Veeriah, Tom Zahavy, Matteo Hessel and
Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh

Machine Learning, Deep Reinforcement Learning, Transfer, Adaptation, Multi-task Learning, Approximate Probabilistic Inference, Bayesian Networks

12:09

06/12/2020

algorithmic confounding, exploration vs. exploitation, off-policy evaluation, recommendation systems, bandit feedback, closed loop feedback

12:52

18/07/2021

Optimization, Non-Convex Optimization, Reinforcement Learning and Planning, Neuroscience and Cognitive Science, Reasoning; Optimization, Combinatorial Optimization; Reinforcement Learning and Plannin

5:24

18/07/2021

A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

Dong Ki Kim, Miao Liu, Matthew Riemer and
Chuangchuang Sun, Marwa Abdulhai, Golnaz Habibi, Sebastian Lopez-Cot, Gerald Tesauro, Jonathan How

Keywords Paper

Reinforcement Learning and Planning, Multi-Agent RL, Algorithms, Representation Learning, Algorithms, Relational Learning

5:20

18/07/2021

Partial Amortization, Model Predictive Control, Planning, Mutual Information, Skill Discovery, World Models, Model-Based Reinforcement Learning

5:10

22/09/2020

Optimization, Convex Optimization, Reinforcement Learning and Planning, Multi-Agent RL, Algorithms, Large Scale Learning; Probabilistic Methods, Distributed Inference

20:08

19/08/2021